Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Dec 8:2023.12.07.570727. [Version 1] doi: 10.1101/2023.12.07.570727

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction

Pascal Notin 1,†,*, Aaron W Kollasch 2,, Daniel Ritter 3,, Lood van Niekerk 4,, Steffanie Paul 5, Hansen Spinner 6, Nathan Rollins 7, Ada Shaw 8, Ruben Weitzman 9, Jonathan Frazer 10, Mafalda Dias 11, Dinko Franceschi 12, Rose Orenbuch 13, Yarin Gal 14, Debora S Marks 15
PMCID: PMC10723403  PMID: 38106144

Abstract

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

1. Introduction

Proteins carry out a wide range of functions in nature, facilitating chemical reactions, transporting molecules, signaling between cells, and providing structural support to cells and organisms. This astonishing functional diversity is uniquely encoded in their amino acid sequence. For instance, the number of possible arrangements for a 64-residue peptide chain (2064) is already larger than the estimated number of atoms in the universe. Despite substantial progress in sequencing over the past two decades, we have observed a relatively small, biased portion of that massive sequence space. Consequently, the ability to manipulate and optimize known sequences and structures represents tremendous opportunities to address pressing issues in climate, agriculture and healthcare.

The design of novel, functionally optimized proteins presents several challenges. It begins with learning a mapping between protein sequences or structures and their resulting properties. This mapping is often conceptualized as a “fitness landscape”, a multivariate function that characterizes the relationship between genetic variants and their adaptive fitness. The more accurately and comprehensively we can define these landscapes, the better our chances of predicting the effects of mutations and designing proteins with desirable and diverse properties. Machine learning, by modeling complex, high-dimensional relationships, has emerged as a powerful tool for learning these fitness landscapes. In recent years, a plethora of machine learning methods have been proposed for protein modeling, each promising to offer new insights into protein function and design. However, assessing the effectiveness of these methods has proven challenging. A key issue is their evaluation on distinct and relatively sparse benchmark datasets, while relative model performance fluctuates importantly across experimental assays, as was shown in several prior analyses [Riesselman et al., 2018, Laine et al., 2019, Meier et al., 2021]. This situation underscores the importance of scale in the benchmarks used. Larger, more diverse datasets would offer a more robust and comprehensive evaluation of model performance.

To address these limitations, we introduce ProteinGym, a large-scale set of benchmarks specifically tailored to protein design and fitness prediction. It comprises a broad collection of over 250standardized Deep Mutational Scanning (DMS) assays which include over 2.7 million mutated sequences across more than 200 protein families, spanning different functions, taxa and depth of homologous sequences. It also encompasses clinical benchmarks providing high-quality annotations from domain experts about the effects of ~65k substitution and indel mutations in human genes (§ 3).

We have designed ProteinGym to be an effective, holistic, robust, and user-friendly tool. It provides a structured evaluation framework that factors in known limitations of the underlying experimental methods and includes metrics that are tailored to protein design and mutation effect prediction (§ 4). We report the performance in a unified benchmark of over 70 diverse high-performing models that come from various subfields of computational biology (eg., mutation effects prediction, sequence-based models for de novo design, inverse folding), thereby supporting novel comparisons across. Unlike prior benchmarks, ProteinGym integrates both the zero-shot and supervised settings, leading to new insights (§ 5). All models are codified with a common interface in the same open-source codebase, promoting consistency and ease of use. Lastly, a dedicated website offers an interactive platform to facilitate comparisons across datasets and performance settings.

2. Related Work and Background

Multi-task protein benchmarks

In recent years, several benchmarks have been introduced to provide initial means to assess protein model performance across a multitude of tasks of interests, e.g., predicting contacts, structure, thermostability, and fitness. These benchmarks are generally geared towards assessing the quality of learned protein representations, and the extent to which these representations can be broadly leveraged for various tasks. However, for fitness prediction, they all rely on a very limited set of proteins (e.g., 1–3 assays). In comparison, the ProteinGym benchmarks focus on a single task – fitness prediction – and encompass two orders of magnitude more point mutations assessed and vast diversity of protein families included.

TAPE (Tasks Assessing Protein Embeddings) [Rao et al., 2019] covers five protein prediction tasks, each designed to test a different aspect of protein function and structure prediction (secondary structure, contact, remote homology, fluorescence and stability), and focuses on assessments in the semi-supervised regime via carefully curated train-validation-test splits. ProteinGLUE [Capel et al., 2022] also focuses on assessing the usefulness of learned protein representations on supervised downstream tasks. It is comprised of five different tasks, none directly related to protein fitness: secondary structure, solvent accessibility, protein-protein interactions, epitope region and hydrophobic patch prediction. PEER [Xu et al., 2022] also focuses on multi-task benchmarking, grouped in five categories: protein property, localization, structure, protein-protein interactions and protein-ligand interactions. It contains a richer set of evaluations compared with the prior two benchmarks, and also investigates the multi-task learning setting, but is not designed for thorough fitness prediction benchmarking (3 fitness related assays). The handful of fitness DMS assays from these various benchmarks are all subsumed in ProteinGym.

Single task, non-fitness datasets & benchmarks

Efforts to create fair, large-scale, and comprehensive benchmarks have been a significant focus of computational biologists for certain tasks. Among these, the biennial Critical Assessment of protein Structure Prediction (CASP) [Kryshtafovych et al., 2021] is the most renowned. CASP concentrates on protein structure prediction and has set the gold standard in this domain. In parallel to CASP, the Critical Assessment of Functional Annotation (CAFA) [Zhou et al., 2019] challenge provides a platform for evaluating protein function classification. The SKEMPI [Moal and Fernández-Recio, 2012] database is specifically designed to aid the evaluation of computational methods predicting the effect of mutations on protein-protein binding affinity. Several datasets have been curated for specific properties of interest across a diverse set of proteins, for instance thermostability [Tsuboyama et al., 2022, Stourac et al., 2020, Chen et al., 2022] or solubility [Hon et al., 2020].

Protein fitness benchmarks

Closest to our work are the collections of DMS assays that were curated in Hopf et al. [2017] (28 substitution assays), and then further expanded upon in Riesselman et al. [2018] (42 substitution assays) and Shin et al. [2021] (4 indel assays). We include all assays related to fitness prediction from these prior works in ProteinGym. FLIP [Dallago et al., 2021] focused on comparing fitness predictors in the semi-supervised setting, developing a robust evaluation framework and curating cross-validation schemes for three assays. MaveDB [Rubin et al., 2021] is a repository rather than a benchmark, but it compiles a large collection of datasets from multiple variant effect mapping experiments that can be used for benchmarking purposes. An initial prototype of the ProteinGym benchmarks (referred to as ‘ProteinGym v0.1’) was introduced in Notin et al. [2022a]. We have since then significantly expanded the benchmarks in terms of number and diversity of underlying datasets, baselines, evaluation framework and model training regimes (Table A1). This not only enables performance evaluation at an unprecedented scale, but also builds connections between different subfields that are often perceived as separate, as we discuss in the following paragraph.

Clinical Benchmarks

Designing an unbiased, non-circular and broadly applicable benchmark to evaluate the performance of human variant effect predictors at predicting clinical significance is still an open-problem for the clinical community. Combining DMS with clinical annotations has been a fruitful direction to avoid biases [Frazer et al., 2021, Livesey and Marsh, 2023]. ClinGen curated a clinical dataset specifically designed to compare a subset of models [Pejaver et al., 2022].

Relationship between protein fitness, mutation effect prediction and design

The protein fitness landscape refers to the mapping between genotype (e.g., the amino acid sequence) and phenotype (e.g., protein function). While it is a fairly broad concept, it should always be thought about in practice within a particular context (e.g., stability at a given temperature in a specific organism). Models that learn the protein fitness landscape have been shown to be effective at predicting the effects of mutations [Frazer et al., 2021, Jagota et al., 2022, Brandes et al., 2023, Notin et al., 2022b]. But the ability to tell apart the sequences that are functional or not is also critical to protein engineering efforts [Romero et al., 2012, Yang et al., 2018, Wu et al., 2019, Alley et al., 2019b]. Although typically introduced in the context of de novo protein design [Huang et al., 2016], inverse folding methods [Ingraham et al., 2019, Jing et al., 2020, Dauparas et al., 2022, Gao et al., 2022] can also be used for mutation effects prediction (Appendix A.4.1). There is thus a very tight connection between protein fitness, mutation effect prediction and protein engineering, and the same models can be used for either task depending on context. We seek to illustrate this connection through this work, comparing baselines introduced in different fields (e.g., protein representation learning, inverse folding models, co-evolution models) on the same benchmarks, and including different metrics that are geared more to mutation effect prediction (e.g., Spearman) or design tasks (e.g., NDCG).

3. ProteinGym benchmarks

ProteinGym is a collection of benchmarks (Fig. 1) that cover different types of mutation (ie., substitutions vs. indels), ground-truth labels (ie., experimental measurement from DMS vs. clinical annotations), and model training regime (ie., zero-shot vs. supervised).

Figure 1: ProteinGym benchmarks.

Figure 1:

ProteinGym is comprised of three layers. The data layer encompasses two complementary ground truth labels from DMS assays and clinical annotations from experts. For both, we analyze two types of mutations: substitutions and indels. The model layer is comprised of a diverse set of baselines, tailored to both zero-shot and supervised training regimes. Lastly, the analytics layer includes several performance metrics geared towards fitness prediction or protein design evaluation. Different segmentation variables (e.g., MSA depth, assayed phenotype, taxa) facilitate the comparisons of models across diverse settings

3.1. Mutation types

We curate benchmarks for two types of protein mutations – substitutions and indels (insertions or deletions), each with unique implications for the structure, function, and modeling of proteins.

Substitutions

Substitution mutations refer to a change in which one amino acid in a protein sequence is replaced by another. Depending on the properties of the substituted amino acid, this can have varied impacts on the protein’s structure and function, which can range from minimal to drastic. The influence of a substitution largely depends on whether it is conservative (i.e., the new amino acid shares similar properties to the original) or non-conservative. In terms of computational modeling, substitutions are the most commonly addressed mutation type, and the majority of mutation effect predictors support substitutions.

Indels

Indel mutations correspond to insertions or deletions of amino acids in protein sequences. While indels can affect protein fitness in similar ways to substitutions, they can also have profound impacts on protein structure by altering the protein backbone, causing structural modifications inaccessible through substitutions alone [Shortle and Sondek, 1995, Tóth-Petróczy and Tawfik, 2013]. From a computational perspective, indels present a unique challenge because they alter the length of the protein sequence, requiring additional considerations in model design and making it more difficult to align sequences. For instance, the majority of models trained on Multiple Sequence Alignments are typically unable to score indels due to the fixed coordinate system they operate within (see § 4). Furthermore, when dealing with probabilistic models, comparing relative likelihoods of sequences with different lengths results in additional complexities and considerations.

3.2. Dataset types

The fitness of a protein is a measure of how well a protein can perform its function within an organism. Factors that influence protein fitness are diverse and include stability, folding efficiency, catalytic activity (for enzymes), binding specificity and affinity. To properly capture this diversity, we curated a broad set of experimental assays that map a given sequence to phenotypic measurements that are known or hypothesized to be related to its fitness. We focused on two potential sources of ground truth: Deep Mutational Scanning (DMS) assays and Clinical datasets.

Deep Mutational Scanning assays

Modeling protein fitness landscapes presents a challenge due to the complex relationship between experimentally measured protein fitness, the distribution of natural sequences, and the underlying fitness landscape. It is challenging to isolate a singular, measurable molecular property that reflects the key aspects of fitness for a given protein. In developing ProteinGym, we prioritized assays where the experimentally measured property for each mutant protein is likely to represent the role of the protein in organismal fitness. The resulting compilation of over 250 DMS assays extends over a wide array of functional properties, including ligand binding, aggregation, thermostability, viral replication, and drug resistance. It encompasses diverse protein families, such as kinases, ion channel proteins, G-protein coupled receptors, polymerases, transcription factors, and tumor suppressors. In contrast to most DMS assay collections that focus exclusively on single amino acid substitutions, ProteinGym includes several assays with multiple amino acid variants. Moreover, it spans different taxa (i.e., humans, other eukaryotes, prokaryotes, and viruses), alignment depths, and mutation types (substitutions vs indels). All details about the curation and pre-processing of these DMS assays are provided in Appendix A.3.

Clinical datasets

ClinVar [Landrum and Kattman, 2018] is an extensive, public database developed by the National Center for Biotechnology Information (NCBI). It serves as an archival repository that collects and annotates reports detailing the relationships among human genetic variations and associated phenotypes with relevant supporting evidence, thereby providing robust, clinically annotated datasets that are invaluable for understanding the functional impact of mutations. From the standpoint of benchmarking mutation effects predictors, ClinVar permits the direct comparison of predictive models in terms of their accuracy in estimating the functional impact of mutations on human health. Annotations are also available for an order of magnitude more distinct proteins compared with our DMS-based benchmarks, albeit much sparser per protein (see table 1). In the case of indels, we focused on short (≤3 amino acids) variants. In ClinVar, 84% of indel annotations are pathogenic, so we added to our clinical dataset common indels from gnomAD (allele frequency >5%) as pseudocontrols [Karczewski et al., 2020].

Table 1: ProteinGym datasets summary.

ProteinGym includes a large collection of DMS assays and clinical datasets that offer complementary viewpoints when assessing protein fitness. The table reports the number of mutants and unique proteins per dataset (the total being deduped across datasets).

Dataset Description Mutation type # Proteins # Mutants
DMS High-throughput assays evaluating the functional impact of a wide range of protein mutations Substitutions 217 2.4M
Indels 66 0.3M
Clinical Expert-curated clinical annotations across a wide range of human genes Substitutions 2,525 63k
Indels 1,555 3k
Total 3,422 2.7M

3.3. Model training regime

Lastly, we discriminate in our benchmarks between zero-shot and supervised settings. In the supervised regime we are allowed to leverage a subset of labels to train a predictive model, while in the zero-shot setting we seek to predict the effects of mutations on fitness without relying on the ground-truth labels for the protein of interest. These two settings offer complementary viewpoints of practical importance. For instance, in settings where labels are subject to several biases or scarcely available (e.g., labels for rare genetic pathologies), we need methods with robust zero-shot performance performance. In cases where we seek to design new proteins that simultaneously optimize several properties of interest (e.g., binding affinity, thermostability) and we have collected a sufficiently large number of labels for each target, supervised methods are more appropriate. The need to rely on labels is even more pronounced when we seek to optimize several anti-correlated properties or when evolution is a poor proxy for the property of interest. Predictions obtained in the zero-shot settings may also be used to augment supervised models [Hsu et al., 2022a]. The two settings require substantially different evaluation frameworks, which we detail in § 4.

4. Evaluation framework

4.1. Zero-shot benchmarks

DMS assays

In the zero-shot setting we predict experimental phenotypical measurements from a given assay, without having access to the labels at training time. Due to the often non-linear relationship between protein function and organism fitness [Boucher et al., 2016], the Spearman’s rank correlation coefficient is the most generally appropriate metric for model performance on experimental measurements. We use this metric similarly to previous studies [Hopf et al., 2017, Riesselman et al., 2018, Meier et al., 2021]. However, in situations where DMS measurements exhibit a bimodal profile, rank correlations may not be the optimal choice. Consequently, for these instances, we supplement our performance assessment with additional metrics, namely the Area Under the ROC Curve (AUC), and the Matthews Correlation Coefficient (MCC), which compare model scores with binarized experimental measurements. Furthermore, for certain goals (e.g., optimizing functional properties of designed proteins), it is more important that a model is able to correctly identify the most functional proteins, rather than capture the distribution of the full data. Thus, we also calculate the Normalized Discounted Cumulative Gains (NDCG), which up-weights a model if it gives its highest scores to sequences with the highest DMS value. We also calculate Top K Recall, where we select K to be the top 10% of DMS values. To avoid placing too much weight on properties where we have many assays (e.g. thermostability), we first compute each of these metrics within groups of assays that measure similar functions. The final value of the metric is then the average of these averages, giving each functional group equal weight.

Clinical datasets

For the clinical data, with pathogenic and benign categories, we calculate the areas under the ROC and precision-recall curves. In the substitution dataset, 50% of the labels are in approximately 10% of the proteins. Since clinical labels across genes correspond to underlying pathologies that are very distinct to one another, it is preferable to assess performance on a gene-by-gene basis. We thus compute the average per-gene performance on the substitution benchmark. However, in the case of indels, only about half of the proteins has a pathogenic label (and only 10% have a both pathogenic and benign or pseudocontrol labels), so we compute the total AUC for the full dataset. The problem of calibrating model scores in a principled way across different genes is an open problem; we leave this to future work.

Baselines

We implement a diverse set of 50+ zero-shot baselines that may be grouped into alignment-based models, protein language models, inverse folding models and ‘hybrid’ models. Alignment-based models, such as site-independent and EVmutation models [Hopf et al., 2017], DeepSequence [Riesselman et al., 2018], WaveNet [Shin et al., 2021], EVE [Frazer et al., 2021] and GEMME [Laine et al., 2019], are trained on Multiple Sequence Alignments (MSAs). Protein language models are trained on large quantities of unaligned sequences across protein families. They include UniRep [Alley et al., 2019a], the RITA suite [Hesslow et al., 2022], the ESM1 and ESM2 suite [Rives et al., 2021, Meier et al., 2021, Lin et al., 2023], VESPA [Marquet et al., 2022], the CARP suite [Yang et al., 2023a] and the ProGen2 suite [Nijkamp et al., 2022]. Inverse Folding models learn sequence distributions conditional on an input structure [Ingraham et al., 2019]. We include here ProteinMPNN [Dauparas et al., 2022] which is trained on structures in the PDB, MIF [Yang et al., 2023b] trained on CATH4.2 [Dawson et al., 2016], and ESM-IF1 [Hsu et al., 2022b] which is trained on the PDB and a dataset of Alphafold2 folded structures. Hybrid models combine the respective strengths of family-specific alignment-based and family-agnostic language models, such as the MSA Transformer [Rao et al., 2021], evotuned UniRep [Alley et al., 2019a], Tranception [Notin et al., 2022a] and TranceptEVE [Notin et al., 2022b].

Because of the variable length of sequences subject to insertion or deletion mutations, alignment-based methods with fixed matrix representations of sequences are unable to score indels. However, profile Hidden Markov Model (HMM) and autoregressive models include explicit or implicit probabilities of indels at each position. Both are trained on homologous sequences recovered with an MSA and expanded to include insertions. The masked-marginals heuristic Meier et al. [2021] used to predict protein fitness with protein language models trained with a masked-language modeling objective (e.g., ESM-1v, MSA Transformer) does not support indels (see Appendix A.4). We thus only report the performance of the following baselines: Tranception [Notin et al., 2022a], TranceptEVE [Notin et al., 2022b], WaveNet [Shin et al., 2021], HMM [Eddy, 2011], ProGen2 [Madani et al., 2020], UniRep [Alley et al., 2019a], RITA [Hesslow et al., 2022] and ProtGPT2 [Ferruz et al., 2022].

For comparisons on clinical benchmarks, we also include unsupervised baselines developed for variant effect prediction in humans, such as SIFT [Ng and Henikoff, 2002], MutPred [Li et al., 2009], LRT [Chun and Fay, 2009], MutationAssessor [Reva et al., 2011], PROVEAN [Choi et al., 2012], PrimateAI [Sundaram et al., 2018] and LIST-S2 [Malhis et al., 2020].

4.2. Supervised benchmarks

DMS assays

We leverage the same set of 250+ substitutions and indels DMS assays as for the zero-shot setting. In the supervised setting, greater care should be dedicated to mitigating overfitting risks, as the observations in biological datasets may not be fully independent. For instance, two mutations involving amino acids with similar biochemical properties at the same position will tend to produce similar effects. If we train on one of these mutations and test on the other, we will tend to overestimate our ability to predict the effects of mutants at unseen positions. In order to quantify the ability of each model to extrapolate to unseen positions at training time, we leverage 3 types of cross validation schemes introduced in Notin et al. [2023]. In the Random scheme, each mutation is randomly assigned to one of five different folds. In the Contiguous scheme, we split the sequence contiguously along its length, in order to obtain 5 segments of contiguous positions, and assign mutations to each segment based on the position at which it occurs. Lastly, in the Modulo scheme, we assign positions to each fold using the modulo operator to obtain 5 folds overall. In all supervised settings, we report both the Spearman’s rank correlation and Mean Squared Error (MSE) between predictions and experimental measurements. A more challenging generalization task would involve learning the relationship between protein representation (sequence, structure, or both) and function using only a handful of proteins, and then extrapolating at inference time to protein families not encountered during training. This setting may be seen as a hybrid between the zero-shot and supervised regimes – closer to zero-shot if we seek to predict different properties across families, and closer to the supervised setting if the properties are similar (eg., predicting the thermostability of proteins with low sequence similarity with the ones in the training set). While this study does not delve into these hybrid scenarios, the DMS assays in ProteinGym can facilitate such analyses.

Clinical datasets

Given the restrictions on the number of labels available per gene and the discrepancies between train-validation-test splits across the different supervised baselines, we report test performance on the full set of all available ClinVar labels. We note that this may result in overestimating the performance of supervised methods for which the training data would substantially overlap with the labels considered in our ClinVar set. Further data leakage occurs for models trained on population frequencies, as most ClinVar benign labels are established based on observed frequencies in humans (situation especially evident for our indel dataset where we use frequent variants as pseudocontrols). Interestingly, despite this overfitting risk and as first observed in Frazer et al. [2021], we find that most supervised methods are outperformed by the best unsupervised methods (Fig. 2).

Figure 2: Comparing baselines across datasets and across performance metrics.

Figure 2:

(A) Performance estimated against known clinical labels (avg. AUC over genes in ClinVar (x axis)), and DMS assays assessing the clinical effect of variants in humans (avg. Spearman (y axis)). (B) The zero-shot models’ median NDCG@10% (x-axis) against median Spearman (y-axis) on the DMS substitutions.

Baselines

For the supervised DMS benchmark, we report two suites of baselines. The first suite is comprised of models that take as inputs One-Hot-Encoded (OHE) features. Following the protocol described in Hsu et al. [2022a], we augment the model inputs with predictions from several state-of-the-art zero-shot baselines: DeepSequence [Riesselman et al., 2018], ESM-1v [Meier et al., 2021], MSA Transformer [Rao et al., 2019], Tranception [Notin et al., 2022a] and TranceptEVE [Notin et al., 2022b]. Following prior works from the semi-supervised protein modeling literature [Heinzinger et al., 2019, Dallago et al., 2021], the second suite is formed with baselines that leverage mean-pooled embeddings from several protein language models (ESM-1v, MSA Transformer and Tranception) in lieu of OHE features. We also augment these baselines with zero-shot predictions obtained with the same model used to extract the protein sequence embeddings. Lastly, we include ProteinNPT [Notin et al., 2023], a semi-supervised pseudo-generative architecture which jointly models sequences and labels by performing axial attention [Ho et al., 2019b, Kossen et al., 2022] on input labeled batches. Additional details for the corresponding model architectures are reported in Appendix A.4.2. On the various clinical benchmarks, the above baselines are challenging to train given the low number of labels available per gene. We instead include several supervised baselines that have been specifically developed for variant effects predictions in humans, such as ClinPred [Alirezaie et al., 2018], MetaRNN [Li et al., 2022], BayesDel [Feng, 2017], REVEL [Ioannidis et al., 2016] and PolyPhen-2 [Adzhubei et al., 2010] (full list in A.4.3).

5. Results

5.1. Substitution benchmarks

We follow the experimental protocol described in § 4.1 and report our main results on the zero-shot DMS benchmarks in Table 2, supervised DMS benchmark in Table 3, and combined supervised and unsupervised clinical benchmarks in Fig. 2A. TranceptEVE emerges as the best overall method across the various settings. One of the key objectives of ProteinGym benchmarks is to analyze performance across a wide range of regimes to guide model selection depending on the objectives of the practitioners. To that end we also provide a performance breakdown across MSA depth, mutational depth and taxa where relevant (see Appendix A.5 and supplements). While TranceptEVE tops the ranking across the majority of metrics and settings, GEMME achieves the best performance in several categories, such as assays of viral or non-human eukaryotic proteins, and low and medium depth MSAs. While we report average performance per metric in Table 2, the distribution of scores across assays is also insightful. For instance, certain models are heavily penalized in aggregate rankings due to very poor performance on a handful of assays (e.g., ESM-1v), such that looking a the median performance in lieu of the average provides a complementary viewpoint. Furthermore, although most models rank similarly under Spearman and NDCG, some have comparatively better performance in one over the other (Fig. 2B). Superior ranking under NDCG may suggest a model is better at predicting the top end of a score distribution, which may be a desirable feature when using models for design and optimization. Many of the alignment-based methods (e.g. EVmutation, WaveNet and DeepSequence) exhibit this behavior (Fig. A1). Models with higher relative Spearman (e.g., ESM-1v and ESM-2) may be more effective for cases where the model needs to learn the full property distribution well, such as with mutation effect prediction. Lastly, in the zero-shot setting, autoregressive protein language models (e.g., Tranception, ProGen2) tend to outperform their masked language modeling (MLM) counterparts (e.g., ESM models). However, in supervised settings, both types of models provide valuable embeddings for learning. The optimal method depends on the specific situation, as observed in Table 3 and Table A16. The best performance is achieved with the ProteinNPT architecture, demonstrating the value from performing self-attention alternatively across columns (i.e., amino acid tokens and labels) and rows (i.e., protein sequences) to learn a rich representation of the data.

Table 2: Zero-shot substitution DMS benchmark.

Average Spearman’s rank correlation, AUC, MCC, NDCG@10%, and top 10% recall between model scores and experimental measurements on the ProteinGym substitution benchmark. We use ‘ens.’ as a shorthand for ensemble.

Model type Model name Spearman AUC MCC NDCG Recall
Alignment-based Site-Independent 0.361 0.697 0.288 0.746 0.201
WaveNet 0.216 0.623 0.174 0.684 0.154
EVmutation 0.397 0.717 0.306 0.775 0.220
DeepSequence (ens.) 0.422 0.731 0.330 0.775 0.227
EVE (ens.) 0.441 0.741 0.343 0.781 0.231
GEMME 0.457 0.750 0.353 0.775 0.209
Protein language UniRep 0.193 0.607 0.149 0.647 0.14
CARP (640M) 0.373 0.704 0.289 0.749 0.210
RITA XL 0.373 0.708 0.294 0.750 0.194
ProGen2 XL 0.392 0.718 0.307 0.766 0.200
ESM-1b 0.399 0.722 0.315 0.748 0.205
ESM2 (15B) 0.405 0.723 0.318 0.759 0.210
ESM-1v (ens.) 0.416 0.730 0.329 0.753 0.216
VESPA 0.437 0.743 0.348 0.774 0.201
Hybrid UniRep evotuned 0.347 0.693 0.274 0.737 0.181
MSA Transformer (ens.) 0.434 0.738 0.341 0.777 0.224
Tranception L 0.436 0.740 0.342 0.778 0.221
TranceptEVE L 0.457 0.752 0.357 0.785 0.231
Inverse Folding ProteinMPNN 0.258 0.640 0.196 0.712 0.186
MIF-ST 0.401 0.718 0.310 0.766 0.227
ESM-IF1 0.422 0.730 0.331 0.748 0.223

Table 3: Supervised substitution DMS benchmark.

Spearman’s rank correlation and MSE between model predictions and experimental measurements. MSAT is a shorthand for MSA Transformer.

Model type Model name Spearman (↑) MSE (↓)
Contig. Mod. Rand. Avg. Contig. Mod. Rand. Avg.
OHE None 0.064 0.027 0.579 0.224 1.158 1.125 0.898 1.061
DeepSequence 0.400 0.400 0.521 0.440 0.967 0.940 0.767 0.891
ESM-1v 0.367 0.368 0.514 0.417 0.977 0.949 0.764 0.897
MSAT 0.410 0.412 0.536 0.453 0.963 0.934 0.749 0.882
Tranception 0.419 0.419 0.535 0.458 0.985 0.934 0.766 0.895
TranceptEVE 0.441 0.440 0.550 0.477 0.953 0.914 0.743 0.870
Embed. ESM-1v 0.481 0.506 0.639 0.542 0.937 0.861 0.563 0.787
MSAT 0.525 0.538 0.642 0.568 0.836 0.795 0.573 0.735
Tranception 0.490 0.526 0.696 0.571 0.972 0.833 0.503 0.769
NPT ProteinNPT 0.547 0.564 0.730 0.613 0.820 0.771 0.459 0.683

5.2. Indel benchmarks

The results for an indel-compatible subset of the models in ProteinGym is shown in Table 4. The Spearman rank correlations are separated by the method used to generate test sequences: unbiased libraries, or model-designed sequences biased towards natural sequences. The medium sized Tranception model has the highest Spearman rank correlation and AUC across all the assays, although TranceptEVE is the highest performing on the subset of assays measuring designed or natural sequences. We provide additional results on the clinical indel benchmarks in Appendix A.5.

Table 4: Zero-shot indel DMS benchmark.

Spearman’s rank correlations and AUC between model scores and experimental measurements.

Model type Model name Spearman by DMS type (↑) AUC (↑)
Library Designed/Natural All All
Alignment models HMM 0.459 0.385 0.391 0.731
WaveNet 0.427 0.289 0.285 0.667
Protein language models UniRep −0.016 0.159 0.169 0.589
RITA L 0.395 0.521 0.459 0.764
ProtGPT2 0.153 0.188 0.194 0.615
ProGen2 XL 0.369 0.471 0.434 0.749
Hybrid models Tranception L 0.449 0.443 0.399 0.742
Tranception M 0.529 0.392 0.298 0.735
TranceptEVE 0.472 0.521 0.457 0.774

6. Resources

Codebase

A key contribution of our work is the consolidation of the numerous baselines discussed in § 4 in a single open-source GitHub repository. While the main code for the majority of these baselines is publicly available, it often does not support fitness prediction out-of-the-box or, when it does, the codebase does not necessarily provide all the required data processing logic (e.g., preprocessing of MSAs in MSA Transformer) or handle all possible edge cases that may be encountered (e.g., scoring of sequences longer than context size in the ESM suite). Our GitHub repository addresses all of these gaps and provides a consistent interface that will aid in the seamless integration of new baselines as they become available.

Processed datasets

We also make publicly available all processed datasets used in our various benchmarks in a consistent format, including all DMS assays, model scores, ClinVar/gnomAD datasets, predicted 3D structures and Multiple Sequence Alignments required for training and scoring (see Section A.3.3 for more details).

Website

Lastly, we developed a user-friendly website in which all benchmarks are accessible, with functionalities to support drill analyses across various dimensions (e.g., mutational depth, taxa) and exporting capabilities.

7. Conclusion

ProteinGym addresses the lack of large-scale benchmarks for the robust assessment of models developed for protein design and fitness prediction. It facilitates the direct comparison of methods across several dimensions of interest (e.g., MSA depth, mutational depth, taxa), based on different ground truth datasets (e.g., DMS assays vs Clinical annotations), and in different regimes (e.g., zero-shot vs supervised). We expect the ProteinGym benchmarks and the various data assets we publicly release along with them, to be valuable resources for the Machine Learning and Computational Biology communities, and we plan to continue updating the benchmarks as new assays and baselines become available.

Acknowledgments and Disclosure of Funding

P.N. was supported by GSK and the UK Engineering and Physical Sciences Research Council (ESPRC ICASE award no.18000077). Y.G. holds a Turing AI Fellowship (Phase 1) at the Alan Turing Institute, which is supported by EPSRC grant reference V030302/1. A.K., P.N., and D.S.M. are supported by a Chan Zuckerberg Initiative Award (Neurodegeneration Challenge Network, CZI2018-191853). S.P., H.S., and D.S.M. are supported by a NIH Transformational Research Award (TR01 1R01CA260415). L.vN. and R.O. gratefully acknowledge funding from CZI NDCN. R.W. is supported by the UK Engineering and Physical Sciences Research Council, as part of the CDT in Health Data Science. D.F is supported by Wellcome Leap Inc, Chan Zuckerberg Initiative/Silicon Valley Community Foundation and The Coalition for Epidemic Preparedness Innovations. We thank the broader Marks Lab, in particular Javier Marchena-Hurtado, for helpful discussions when writing this manuscript. We gratefully acknowledge the compute resources provided by Invitae to train most of the EVE models that we used for the clinical benchmark.

A. Appendix

A.1. Social Impact

Protein design holds considerable promise for various fields, ranging from medicine to agriculture, and is likely to have a profound social impact. However, the development of such technology introduces several concerns, particularly relating to the dual use of protein fitness and design models. For instance, while beneficial for areas like drug design, these models can also be potentially utilized for harmful purposes such as bio-weapon design. Consider a generative model developed for therapeutic purposes: it typically penalizes predicted toxicity. Yet, the logic of this model could be inverted to instead reward for toxicity [Urbina et al., 2022]. Indeed, any tool or benchmark developed to improve protein design can be manipulated for nefarious objectives. Lastly, protein fitness models will significantly influence the way experiments are conducted. With increased adoption and development of protein design, substantial portions of experimental work can be accelerated, leading to quicker iterations and improved results. Nonetheless, the need for wet lab experimentation remains. These technological advancements will serve to augment, rather than completely supplant, traditional experimental procedures.

Additionally, the American College of Medical Genetics (ACMG) disregards computational prediction of variant effects due to insufficient validation. Consequently, it is essential to create benchmarks using clinical data in order to promote the acceptance of these machine learning methods in medical practice.

A.2. Limitations

Deep mutational scans

While significant efforts have been dedicated to curating and preprocessing a diverse set of deep mutational scans (DMS), the very nature of these scans imposes biases and limitations to this benchmark:

  1. Measurement noise Experiments do not have a perfect dynamic range, often imposing a restrictive ceiling and/or floor to the measured response of mutation effects that is not meaningful for protein function and mutation effect prediction. Furthermore, noise is a perennial issue in high-throughput assays, and some assays have poor experimental replicate correlation. Taken together, this means that one cannot expect perfect correlation between experiment and model. Since these considerations affect different proteins to different extents, computing average Spearman correlations across proteins can be misleading.

  2. Bias There is additional bias in the types of proteins chosen for deep mutational scans. This can be due both to experimental limitations on which proteins’ functions can be assayed (for example, disordered proteins are challenging), and to protein prioritisation considerations (for example, viral and cancer-related proteins are over-represented).

  3. Representativeness No assay is fully representative of the impacts of protein changes on the evolutionary fitness of an organism, which typically involves a convolution of molecular functions across changing environments. In fact, many assays target only a single feature such as expression, binding, or enzymatic activity.

  4. Inconsistent processing The reported fitness effects from DMS are themselves the result of modeling and analysis of the raw data. The treatment of data is extremely heterogeneous across the community and different analyses can lead to different conclusions on the effect of mutations. For a perfect standardised curation of experimental results one should treat all data with the same approach. We leave this type of analysis for future work.

Human mutation databases

ClinVar data has the advantage of covering more proteins than DMS, even if only human proteins involved in disease. But it has several limitations:

  1. Noise This dataset, by the very community-based nature of it, is very noisy. Filtering to more stringently curated ClinVar labels, or to more recent labels, improves correlation with predictions from sequence models [Frazer et al., 2021]. Here we decided to keep a reasonable number of clinical labels – a trade-off between quantity and quality.

  2. Bias Clinical labels are biased towards classes of proteins that are heavily studied, such as wellknown cancer genes, as well as towards European ancestry.

  3. Circularity Grimm et al. [2015] details two types of circularity that hinder the evaluation of human variant effect predictors. In a supervised benchmark, there is the potential for data leakage from training to testing, even for different variants in the same protein. Even for our unsupervised benchmark, where models have not trained on clinical labels, there is the potential for another type of circularity: evolutionary conservation is one of the criteria used to classify a variant as benign or pathogenic in ClinVar.

Finally, the current benchmarks are limited to mutations in coding regions. But there are both DMS datasets and clinical labels (although fewer of them) in regulatory regions – for example in UTRs, introns, promoters. This could be an interesting direction of growth for these benchmarks.

A.3. Datasets

A.3.1. DMS assays

Evolution of protein fitness benchmarks based on DMS assays

As discussed in § 2, our DMS benchmarks build on several prior works that had compiled a growing library of such assays. We summarize their content in Table A1.

Selection and processing

We focused on several different criteria when determining which DMS assays to include in ProteinGym. These are:

  1. The public availability of data

  2. The experimental throughput (how many mutations were assayed)

  3. The level of noise between experiment replicates

  4. The dynamic range of the assay

  5. The assay type (selection, enrichment, etc) and whether or not it captures evolutionary constraints.

  6. If the assay used amino-acid substitution or indel mutations (no UTR, tRNA, promoter, etc. variants were included).

Final list of assays

In-depth metadata about the assays, including the assay type, UniProt ID, MSA start and end positions, mutated positions, and target sequence, is provided under the reference_files directory in the codebase. A complete list of included assays is presented at the end of the appendix (See Tables A19 and A20)

Processing of large thermostability dataset

A large dataset of thermostability assays of 331 natural domains [Tsuboyama et al., 2023] contributed 65 assays to our list. We processed these assays as follows:

We used the set of non-redundant natural domains (referred to as Dataset #5 in the original paper). After mapping to UniProt IDs for our DMS id naming convention and removing datasets where none of the tested evolutionary models had a Spearman correlation above 0.2 (suggesting that there is inadequate evolutionary fitness signal in the stability assay, preventing meaningful comparisons between models), we were left with 65 thermostability assays of short domains (40–72 residues long). For substitutions, there was 99+% coverage of each position with 14–19 mutations per position (and 52 of those datasets with multiples), and for indels there was a deletion, Gly and Ala insertion at every position.

Classification of DMS assays

We grouped the substitution DMS assays into five function types: activity, binding, expression, organismal fitness, and stability. While many DMS assays are well described by multiple classes, we assigned each to a primary class such that the classes are non-overlapping. We provide a brief description of each class in Table A2.

Cross-validation schemes

As described in § 4.2, we leverage 3 types of cross validation schemes: ‘Random’, ‘Contiguous’ and ‘Modulo’. We only keep single substitution mutants for the corresponding analyses. For the ‘Random’ split, we randomly assigned each mutant to one of 5 folds. The ‘Contiguous’ split is obtained by splitting the sequence in contiguous segments along its length, ensuring the segments are comprised of the same number of positions. The ‘Modulo’ scheme is obtained by assigning positions to each fold using the modulo of the position number by the total number of folds. Therefore, for a 5-fold cross validation, position 1 is assigned to fold 1, position 2 to fold 2, …, position 6 to fold 1, etc. Once again, we make sure to only consider mutated positions. We operate a five fold cross validation for all assays except for assays F7YBW8_MESOW [Aakre et al., 2015] and SPG1_STRSG [Wu et al., 2016], as these contain only 4 mutated positions.

High-level statistics

Table A3 describes the size and mutation depth of the indel datasets.

A.3.2. Clinical datasets

We collect 65k variants from the ClinVar and gnomAD databases (Table A4).

For our indel benchmark, detailed in Section 4.1, we focus on short indels, less than or equal to three amino acids, which make up over 80% of in-frame indels in our data. There were insufficient benign annotations for indel clinical variants, so gnomAD common variants (allele frequency > 0.5%) were used as a pseudocontrols.

ClinVar processing

The clinical substitutions dataset was obtained following the procedure from EVE [Frazer et al., 2021], detailed in Supplementary Methods Section 3 of that paper (which dataset is downloadable from https://evemodel.org/download/bulk), but correcting for mapping errors to GRCh38, which yielded 2,525 proteins and 63k variants, with Pathogenic/Likely Pathogenic/Benign/Likely Benign annotations and at least 1 star of clinical evidence - where assertion criteria is provided by a submitter. As a result, our dataset contains significantly more mutants than the dataset from [Frazer et al., 2021] (42k vs. 63k).

The raw set of inframe indels was obtained from ClinVar on February 6th, 2023, by using the following query:

(“inframe deletion”[Molecular consequence] OR “inframe indel”[Molecular consequence] OR “inframe insertion”[Molecular consequence])

This query yielded 18407 variants. After filtering out invalid/uncertain amino acids, repeats, remaining frameshift variants, and synonymous/stop codons, 17039 (92.5%) remained.

When filtering for Benign/Pathogenic/Likely Benign/Likely Pathogenic annotations (80%+ of annotations are uncertain significance), and selecting variants in genes with at least one P/LP annotation, and filtering indels up to 3 amino acids, 2090 / 18407 = 11.35% of the original variants remained, 330 benign and 1760 pathogenic. When using gnomAD as the benign pseudocontrols, we only keep the 1760 pathogenic variants from ClinVar.

All the preprocessing code from raw ClinVar data is available in the companion codebase.

gnomAD processing

The Genome Aggregation Database (gnomAD) [Karczewski et al., 2020] seeks to aggregate genome and exome sequencing data from multiple large-scale sequencing projects, and publishes summary data such as variant allele frequencies in a consistent format. The gnomAD v2.1.1 GRCh38 liftover was downloaded on February 8th 2023 and contains 125,748 exomes and 15,708 genomes. v2 was originally based on the GRCh37 reference sequence and v2.1.1 was lifted over to the GRCh38 reference sequence.

The inframe indels were similarly preprocessed to the ClinVar indels (the preprocessing code from raw data is also available in the repository), yielding 839 common indels up to 3 amino acids in length.

A.3.3. Access

The following provides more details on the code and data Resources (§ 6) accompanying this paper.

The open-source codebase containing a framework for scoring all the benchmarks (and easily facilitating the addition of new benchmarks) is available at https://github.com/OATML-Markslab/ProteinGym. Modifications of certain baselines (e.g. scoring of long sequences beyond the context size in the ESM suite, or pre-processing of MSAs in MSA Transformer) are also released, and all of the model predictions can be reproduced using this repository. We also include preprocessing code for the clinical data (ClinVar/gnomAD) and DMS assays for reproducibility.

We developed a user-friendly website, https://www.proteingym.org containing a leaderboard, detailed results per assay, as well as drill analyses across various dimensions (e.g mutational depth, taxa).

The DMS assays, model scores, Multiple Sequence Alignments, predicted 3D structures, processed ClinVar/gnomAD datasets, and raw files before preprocessing are made available at https://marks.hms.harvard. edu/proteingym/. Instructions for downloading are in the GitHub repository. Some model checkpoints and other files necessary for scoring (for baselines such as profileHMM, PROVEAN) are also available at the FTP server, although most model checkpoints such as ESM-1v are available from their respective repositories.

A.3.4. License

The codebase is open source under the MIT license.

A.4. Baselines

Unless otherwise specified, model scores are calculated by taking the log-ratio of the sequence probabilities between the mutant and wild-type sequences logp(xmut)p(xwt), following the convention in Hopf et al. [2017].

A.4.1. Zero-shot baselines

Alignment-based models
  • Site-independent Model We use a site-wise maximum entropy model to infer the contribution of site-specific amino acid constraints without considering explicit epistatic constraints. This model is implemented as referred to in Hopf et al. [2017].

  • HMM We use the profile hidden Markov model (HMM) implementation in HMMER [Eddy, 2011]. Profile HMMs are frequently used to generate multiple sequence alignments, but also produce log probabilities of sequences that can be used as estimates of fitness for both substitutions and indels [Durbin et al., 1998].

  • EVMutation EVMutation [Hopf et al., 2017] models pairwise evolutionary couplings between protein sequences using a Potts model (otherwise known as a Markov Random Field).

  • DeepSequence DeepSequence [Riesselman et al., 2018] uses a VAE architecture to learn higherorder non-linear evolutionary constraints within each protein family. Mutation effect scores are calculated similarly as EVMutation, as the log-ratio between the mutant and wild-type sequence probabilities logp(xmut|θ)p(xwt|θ), but using the VAE evidence lower bound (ELBO) as a proxy for p(x|θ).

  • WaveNet We use a previously published dilated convolutional neural network (dilCNN) based on the WaveNet architecture [Shin et al., 2021] as an example of a family-specific sequence decoder capable of handling indels. Due to the expense of training a separate model for each protein, we only evaluate this model against the DMS datasets. Sequence scores are calculated as the difference in (length-normalized) log-likelihoods between the mutant and wild-type sequences.

  • EVE EVE [Frazer et al., 2021] is a Bayesian VAE model architecture for predicting clinical variant effects. The model includes a Gaussian Mixture Model fitted to the background distribution of mutations, in order to provide interpretable protein-specific pathogenicity scores. We use the ClinVar preprocessing pipeline from EVE, and EVE is also used in TranceptEVE [Notin et al., 2022b].

  • GEMME GEMME is the Global Epistatic Model for predicting Mutational Effects. It infers the conservation of combinations of mutations across the entire sequence according to an evolutionary tree [Engelen et al., 2009] and combines it with site-wise frequencies to calculate a combined epistatic sequence score for mutations [Laine et al., 2019]. GEMME intakes multiple sequence alignments of protein families as well as specific mutations to generate scores. To obtain scores, we used the GEMME web-tool hosted at http://www.lcqb.upmc.fr/GEMME/submit.html with default parameters.

Protein language models

Protein language models are so called because they all use variants of the Transformer [Vaswani et al., 2017] architecture popularised in natural language processing.

  • UniRep UniRep [Alley et al., 2019b] trains a Long Short-Term Memory (LSTM) model on UniRef50 [Suzek et al., 2015] sequences. It learns how to internally represent proteins by being trained on next amino acid prediction through minimizing cross-entropy loss. While the core model is trained on unaligned sequences, UniRep can also be fine-tuned on sets of homologous sequences from a given family, retrieved with a MSA. This process is called ‘evotuning’ and typically leads to stronger fitness prediction performance.

  • ESM ESM-1b [Rives et al., 2021] and ESM-1v [Meier et al., 2021] are protein language models with a Transformer encoder architecture similar to BERT [Devlin et al., 2019] and trained with a Masked-Language Modeling (MLM) objective on UniRef50 and UniRef90 respectively. We extend the original ESM codebase for these two models to handle sequences that are longer than the model context window (ie., 1023 amino acids), with the approach described in Brandes et al. [2023] for ESM-1b and in Notin et al. [2022a] for ESM-1v. We predict fitness for ESM models with the masked-marginal approach introduced in Meier et al. [2021], which provides optimal performance on substitutions, but does not support indels.

  • CARP CARP [Yang et al., 2023a] is a protein language model trained with a MLM objective on Uniref50. The architecture leverages convolutions instead of self-attention, leading to computational speedups while maitenaning high downstream task performance.

  • RITA RITA [Hesslow et al., 2022] is an autoregressive language model akin to GPT2 [Radford et al., 2019], trained on UniRef100 [Suzek et al., 2015]. Four model sizes are available, ranging from 85 million to 1.2 billion parameters. RITA takes unaligned sequences as input, and can score both substitution and indel mutations.

  • ProGen2 ProGen2 [Nijkamp et al., 2022] is an autoregressive protein language model trained on a mixture of UniRef90 [Suzek et al., 2014] and BFD30 [Steinegger and Söding, 2018]. It follows the standard transformer decoder architecture, and five models of different sizes are available, ranging from 151 million to 6.4 billion parameters. ProGen2 takes unaligned sequences as input, and can score both indel and substitution mutations.

  • VESPA VESPA [Marquet et al., 2022] is as Single Amino acid Variant (SAV) effect predictor based on a combination the embeddings from the protein language model ProtT5 [Elnaggar et al., 2021], as well as per-residue conservation predictions.

Inverse Folding models

Inverse folding models learn the conditional distribution of sequences that are likely to fold to an input protein structure [Ingraham et al., 2019]. Given that there may not be experimentally solved structures for the target sequence of all DMS assays in ProteinGym, we generate input structures using Alphafold2 (AF2) [Jumper et al., 2021]. The inverse folding model in combination with AF2 encompasses an end-to-end scoring pipeline that only requires a protein sequence to score variants. As the sequence representation size is defined by the size of the input structure, the models we benchmark here can only score substitutions.

  • ProteinMPNN ProteinMPNN [Dauparas et al., 2022] takes in a protein backbone structure and featurizes it as a graph where backbone (N,C,Cα) atoms are nodes and edges are determined via euclidian distance cut-offs. The model uses a message passing neural network (MPNN) [Ingraham et al., 2019] to encode the structure into a latent graph representation. The model then decodes the representation and samples sequences autoregressively.

  • MIF The Masked Inverse Folding (MIF) and Masked Inverse Folding with Sequence Transfer (MIF-ST) models [Yang et al., 2023b] are structured-conditioned protein language models trained with a MLM objective. MIF is trained on CATH4.2 [Dawson et al., 2016], and MIF-ST further augments the MIF model with embeddings from CARP (640M).

  • ESM-IF1 ESM-IF1 [Hsu et al., 2022b] functions similarly to ProteinMPNN but leverages a Geometric Vector Perceptron [Jing et al., 2020] (an equivariant message passing module ideal for coordinate data) as the architecture for the structure encoder and sequence decoder.

Hybrid models
  • MSA Transformer The MSA Transformer [Rao et al., 2021] learns a representation of Multiple Sequence Alignments (MSAs) by training an Axial transformer-based transformer [Ho et al., 2019a] with a MLM objective across a diverse set of 26 million MSAs.

  • Tranception Tranception [Notin et al., 2022a] combines an autoregressive protein language model with inference-time retrieval from a MSA. We evaluate Tranception Small (85M), Tranception Medium (300M parameters) and Tranception Large (700M parameters) both with and without MSA retrieval. Tranception can score both indel and substitution mutations.

  • TranceptEVE TranceptEVE augments Tranception with priors for the amino acid distribution at each position based on an ensemble of EVE models for the protein family of interest. The final output log probability is thus a weighted sum of that EVE log prior, the log probability from the autoregressive transformer model in Tranception, as well as site-specific log probabilities obtained from a retrieved MSA (as in the inference-time retrieval procedure described in Tranception). TranceptEVE can score both indels and substitutions.

A.4.2. Supervised baselines

We leverage the various supervised baselines defined in Notin et al. [2023]:

  • One-hot encoding (OHE) models OHE baselines take as input a one-hot encoding representation of the amino acid sequence, together with zero-shot fitness predictions obtained with several of the baselines discussed above in Appendix A.4.1. Both are input into a L2-penalized regression, following the approach discussed in [Hsu et al., 2022a];

  • Embeddings models Embeddings models are based on mean-pooled embeddings from various protein language models introduced above (e.g., Tranception, ESM-1v, MSA Transformer), augmented with zero-shot fitness predictions from the same model. We refer to Notin et al. [2023] for all implementation details;

  • ProteinNPT ProteinNPT [Notin et al., 2023] is a semi-supervised non-parametric transformer [Kossen et al., 2022] which learns a joint representation of full batches of labeled sequences. It is trained with a hybrid objective consisting of fitness prediction and masked amino acids reconstruction. The model can be used to predict mutation effects for single or multiple properties simultaneously, and sample novel sequences conditioned on label values of interest.

A.4.3. Clinical baselines

We leverage a set of clinical variant effect predictors from dbNSFP v4.4a [Liu et al., 2011, 2020], which is a database of functional predictions for all possible non-synonymous single-nucleotide variants (nsSNVs) in the human genome.

These models were developed primarily to assess the effects of mutations in humans and are included in clinical benchmarks only:

For ESM-1b, we downloaded precomputed scores from [Brandes and Ntranos, 2023] from a recent study that extended ESM-1b to predict all possible missense variant effects in the human genome [Brandes et al., 2023]. TranceptEVE and EVE models were trained for the subset of 2,525 proteins in the clinical benchmark, and the model weights/scores are provided online for further analysis (See Section A.3.3). GEMME scores were obtained as detailed above. We provide an analysis of performance on clinical datasets vs the subset of assays on human proteins in Fig. 2.

A.5. Detailed performance results

A.5.1. DMS substitution benchmarks

Zero-shot

Table A5 shows the results for our zero-shot DMS substitutions benchmark. We report Spearman’s rank correlations and bootstrapped standard error estimates for forty baseline models. Table A6 breaks down our substitution DMS by MSA depth, , Table A7 by function type, Table A8 by taxa, and Table A9 by mutational depth. To compute the final Spearman’s rank correlation reported in Table A5, we first average all the assays for a particular function type together, resulting in five average values (one each for Activity, Binding, Expression, Organismal Fitness, and Stability). The average of these five numbers is the final reported value.

Clustering zero-shot substitution models

We clustered the zero-shot models using hierarchical clustering on the vector of NDCG metrics for each dataset in the DMS substitutions (Fig. A1). We find that models with the same architecture tend to cluster together (e.g., RITA models), however, there are exceptions (e.g., ESM-2 models). We also observe that the alignment-based models tend to cluster together, suggesting that training on the same MSA may promote similar scoring behavior.

Supervised

Table A10 shows the results for our supervised DMS substitutions benchmark. We report Spearman’s rank correlations for 10 baseline models.

A.5.2. DMS indel benchmarks

Zero-shot

Table A14 shows the results for our zero-shot DMS indels benchmark, and Table A15 shows Spearman’s rank correlations for each indel DMS dataset and model. Figure A2 compares each model’s aggregate performance between the Library and Designed DMS sets (numbers provided in Table 4). More detailed performance files are available in the repository.

Supervised

Table A16 ranks the performance of each model on the supervised indel DMS benchmark.

A.5.3. Clinical substitution benchmarks

As discussed in § 4, since the performance of zero-shot models is on par – or higher – than their supervised counterparts we subsume the Clinical zero-shot and supervised rankings into a combined rankings, available in Table. A17. Although supervised models trained on ClinVar labels (such as ClinPred) perform well on the clinical benchmark, unsupervised models (such as TranceptEVE) provide better performance on the subset of DMS assays assessing the clinical effect of variants in humans, and competitive performance on the clinical benchmark without being subject to the same label biases (see Fig. 2).

A.5.4. Clinical indel benchmarks

Table A18 shows model performance on the ClinVar datasets, and Figure A3 shows the combined performance on the DMS and ClinVar indel benchmarks.

Figure A1: Hierarchical clustering of zero-shot models by NDCG performance.

Figure A1:

Heatmap colored by the Pearson correlation of the NDCG@10% values for each DMS assay for each pair of zero-shot models. Lighter color corresponds to higher correlation. The ordering and dendogram were produced by hierarchical clustering of the correlation values.

Figure A2: Performance comparison of indel baselines on different types of assays.

Figure A2:

Spearman’s rank correlation over unbiased libraries vs model-designed sequences biased towards natural sequences.

Figure A3: Performance comparison of indel baselines on the indel benchmarks.

Figure A3:

AUC over ClinVar with gnomAD controls (x axis) and Spearman’s rank correlation over functional assay benchmarks (y axis).

Table A1: Evolution of protein fitness benchmarks.

ProteinGym v0.1 corresponds to benchmarks in Notin et al. [2022a], while ProteinGym v1.0 corresponds to benchmarks in this paper. The EVmutation benchmark was introduced in Hopf et al. [2017], while the DeepSequence benchmark was developed in Riesselman et al. [2018]. Sub., Ind. and mut. are shorthands for substitutions, indels and mutants respectively.

Category Mut. Type Metric or Setting EVmutation Deep Sequence ProteinGym v0.1 ProteinGym v1.0
DMS Sub. Assays (mut.) 26 (0.1M) 38 (0.7M) 87 (1.6M) 217 (2.4M)
Ind. Assays (mut.) 0 (0k) 0 (0k) 7 (270k) 66 (289k)
Clinical Sub. Genes (mut.) 0 0 0 2,525 (63k)
Ind. Genes (mut.) 0 0 0 1,555 (3k)
Training regime Sub. Zero-shot
Ind. Zero-shot - -
Sub. Supervised - - -
Ind. Supervised - - -
Baselines Sub. Zero-shot 5 3 9 42
Ind. Zero-shot 0 0 3 20
Sub. Supervised 0 0 0 9
Ind. Supervised 0 0 0 3
Metrics - Zero-shot 2 3 3 5
- Supervised 0 0 0 2
Table A2: DMS assay function types.

The number of substitution and indel assays in each of the 5 function type categories and a general description used to categorize the assays.

Function type # Assays Description
Subs Indels
Activity 43 3 Assays that directly or indirectly measure a protein’s catalytic (or otherwise biochemical) activity
Binding 14 0 Assays that measure the affinity or the degree to which a protein binds its target
Expression 17 2 Assays that measure how much the protein is expressed in a cell
Organismal fitness 77 6 Assays that measure how much changes in the protein affect an organism’s growth rate
Stability 66 55 Assays that measure how thermostable a protein is
Table A3:

Summary of indel datasets.

Dataset #Datapoints (Benign/Path.) Mutation Depth (Min/Mean/Max) Mutation Source
DMS Assays
AAV 24,909 1/3.57/11 randomization
β-Lac 4,751 1/1/1 library
Kir2.1 10,502 1/1.2/3 library
MtrA 331 8/8/8 library
PTEN 314 1/1/1 library
TP53 341 1/1.5/2 library
amyloid β 2,354 1/14/39 library
OCT1 543 1/1/1 library
Tsuboyama 14,280 1/2.7/3 library
Assays of Natural and Designed Sequences
AAV 225,998 3/13.9/37 model-designed
CM 3,074 1/68.9/82 model-designed
HIS3 6,102 1/8.4/29 interpolations between natural sequences
Human Variants
ClinVar 3k (1,760/839) 1/1.37/3 population variation
Table A4:

Summary of ClinVar human variant datasets.

Dataset #Proteins #Variants #Variants per Protein (Median)
Substitutions 2,525 63k 6
Indels 1,555 3k 1
Table A5: ProteinGym - Zero-shot substitution DMS benchmark.

Ranking based on Spearman’s rank correlation between experimental assay measurement and model prediction. The standard error reported corresponds to the non-parametric bootstrap standard error of the difference between the Spearman performance of a given model and that of the best overall model (i.e., TranceptEVE), computed over 10k bootstrap samples from the set of proteins in the ProteinGym DMS substitution benchmark.

Ranking Model Type Spearman Std. error
1* TranceptEVE L Hybrid model 0.457 0.000
1* GEMME Alignment-based model 0.457 0.007
3 TranceptEVE M Hybrid model 0.456 0.004
4 TranceptEVE S Hybrid model 0.453 0.004
5 EVE (ensemble) Alignment-based model 0.441 0.006
6 VESPA Protein language model 0.437 0.006
7 Tranception L Hybrid model 0.436 0.003
8 EVE (single) Alignment-based model 0.434 0.005
9 MSA Transformer (ensemble) Hybrid model 0.434 0.009
10 Tranception M Hybrid model 0.429 0.005
11 ESM-IF1 Inverse folding model 0.422 0.012
12 DeepSequence (ensemble) Alignment-based model 0.422 0.008
13 MSA Transformer (single) Hybrid model 0.421 0.009
14 Tranception S Hybrid model 0.419 0.006
15 ESM-2 (650M) Protein language model 0.419 0.011
16 ESM-1v (ensemble) Protein language model 0.416 0.011
17 DeepSequence (single) Alignment-based model 0.411 0.008
18 ESM-2 (3B) Protein language model 0.410 0.010
19 ESM-2 (15B) Protein language model 0.405 0.010
20 MIF-ST Inverse folding model 0.401 0.010
21 ESM-1b Protein language model 0.399 0.010
22 EVmutation Alignment-based model 0.397 0.006
23 VESPAl Protein language model 0.393 0.007
24 ProGen2 XL Protein language model 0.392 0.008
25 ESM-2 (150M) Protein language model 0.392 0.012
26 ESM-1v (single) Protein language model 0.385 0.012
27 MIF Inverse folding model 0.381 0.011
28 ProGen2 L Protein language model 0.381 0.008
29 ProGen2 M Protein language model 0.380 0.008
30 ProGen2 Base Protein language model 0.379 0.009
31 Tranception L no retrieval Protein language model 0.375 0.008
32 CARP (640M) Protein language model 0.373 0.011
33 RITA XL Protein language model 0.373 0.009
34 RITA L Protein language model 0.366 0.009
35 Site-Independent Alignment-based model 0.361 0.010
36 RITA M Protein language model 0.350 0.010
37 Tranception M no retrieval Protein language model 0.349 0.009
38 UniRep evotuned Hybrid model 0.347 0.009
39 ProGen2 S Protein language model 0.336 0.011
40 CARP (76M) Protein language model 0.332 0.012
41 ESM-2 (35M) Protein language model 0.325 0.014
42 RITA S Protein language model 0.304 0.012
43 Tranception S no retrieval Protein language model 0.303 0.012
44 CARP (38M) Protein language model 0.283 0.013
45 ProteinMPNN Inverse folding model 0.258 0.011
46 ESM-2 (8M) Protein language model 0.229 0.015
47 WaveNet Alignment-based model 0.216 0.017
48 UniRep Protein language model 0.193 0.016
49 ProtGPT2 Protein language model 0.188 0.011
50 CARP (600K) Protein language model 0.108 0.017
Table A6: ProteinGym - Zero-shot substitution DMS benchmark by MSA depth.

Average Spearman’s rank correlation between model scores and experimental measurements by MSA depth on the ProteinGym substitution benchmark. Alignment depth is measured by the ratio of the effective number of sequences Neff in the MSA, following Hopf et al. [2017], by the length covered L (Low: Neff/L<1; Medium: 1<Neff/L<100; High: Neff/L>100). The All column is the average across the 3 depths.

Model type Model name Spearman by MSA depth (↑)
Low Medium High All
Alignment-based Site-Independent 0.426 0.373 0.320 0.373
WaveNet 0.207 0.255 0.207 0.223
EVmutation 0.403 0.423 0.410 0.412
DeepSequence (ens.) 0.383 0.428 0.473 0.428
EVE (ens.) 0.425 0.453 0.481 0.453
GEMME 0.455 0.470 0.497 0.474
Protein language UniRep 0.181 0.161 0.209 0.184
CARP (640M) 0.314 0.375 0.428 0.372
ESM-1b 0.350 0.398 0.482 0.410
ESM-2 (15B) 0.357 0.414 0.473 0.415
RITA XL 0.315 0.382 0.412 0.370
ESM-1v (ens.) 0.326 0.418 0.502 0.415
ProGen2 XL 0.354 0.405 0.444 0.401
VESPA 0.427 0.455 0.484 0.455
Hybrid UniRep evotuned 0.330 0.344 0.372 0.349
MSA Transformer (ens.) 0.404 0.450 0.488 0.447
Tranception L 0.432 0.438 0.473 0.448
TranceptEVE L 0.451 0.467 0.492 0.470
Inverse Folding ESM-IF1 0.300 0.431 0.544 0.425
MIF-ST 0.376 0.403 0.456 0.412
ProteinMPNN 0.173 0.280 0.434 0.296
Table A7: ProteinGym - Zero-shot substitution DMS benchmark by function type.

Average Spearman’s rank correlation between model scores and experimental measurements on the ProteinGym substitution benchmark, separated into five functional categories (Activity, Binding, Organismal Fitness, Stability and Expression). ‘All’ is the average of all the categories.

Model type Model name Spearman by Function Type (↑)
Activity Binding Expression Organismal Fitness Stability All
Alignment-based Site-Independent 0.369 0.345 0.351 0.382 0.358 0.361
WaveNet 0.219 0.187 0.185 0.303 0.182 0.215
EVmutation 0.440 0.322 0.382 0.411 0.430 0.397
DeepSequence (ens.) 0.455 0.368 0.396 0.413 0.476 0.422
EVE (ens.) 0.464 0.394 0.406 0.447 0.491 0.440
GEMME 0.482 0.387 0.443 0.452 0.519 0.457
Protein language UniRep 0.182 0.203 0.230 0.141 0.210 0.193
CARP (640M) 0.395 0.274 0.419 0.364 0.414 0.373
ESM-1b 0.428 0.289 0.427 0.351 0.500 0.399
ESM-2 (15B) 0.405 0.318 0.425 0.388 0.488 0.405
RITA XL 0.366 0.303 0.416 0.381 0.398 0.373
ESM-1v (ens.) 0.414 0.320 0.456 0.387 0.500 0.415
ProGen2 XL 0.402 0.302 0.423 0.387 0.445 0.392
VESPA 0.468 0.365 0.410 0.440 0.500 0.437
Hybrid UniRep evotuned 0.355 0.304 0.366 0.346 0.366 0.347
MSA Transformer (ens.) 0.469 0.343 0.439 0.421 0.495 0.433
Tranception L 0.465 0.351 0.455 0.436 0.471 0.436
TranceptEVE L 0.487 0.381 0.456 0.460 0.500 0.457
Inverse Folding ESM-IF1 0.368 0.392 0.403 0.324 0.624 0.422
MIF-ST 0.390 0.323 0.432 0.373 0.486 0.401
ProteinMPNN 0.197 0.165 0.198 0.165 0.566 0.258
Table A8: ProteinGym - Zero-shot substitution DMS benchmark by taxa.

Average Spearman’s rank correlation between model scores and experimental measurements on the ProteinGym substitution benchmark, separated by taxon. ‘All’ is the average across the taxa.

Model type Model name Spearman by Taxa (↑)
Human Other Eukaryote Prokaryote Virus All
Alignment-based Site-Independent 0.379 0.385 0.316 0.383 0.366
WaveNet 0.145 0.305 0.293 0.283 0.256
EVmutation 0.409 0.444 0.422 0.388 0.416
DeepSequence (ens.) 0.442 0.469 0.460 0.344 0.429
EVE (ens.) 0.453 0.487 0.468 0.428 0.459
GEMME 0.468 0.510 0.473 0.469 0.480
Protein language UniRep 0.213 0.219 0.165 0.057 0.164
CARP (640M) 0.416 0.386 0.390 0.273 0.366
ESM-1b 0.434 0.475 0.455 0.241 0.401
ESM-2 (15B) 0.431 0.449 0.459 0.313 0.413
RITA XL 0.394 0.384 0.353 0.402 0.383
ESM-1v (ens.) 0.458 0.446 0.454 0.289 0.412
ProGen2 XL 0.384 0.442 0.439 0.391 0.414
VESPA 0.438 0.492 0.490 0.432 0.463
Hybrid UniRep evotuned 0.355 0.363 0.346 0.349 0.353
MSA Transformer (ens.) 0.437 0.505 0.463 0.414 0.455
Tranception L 0.453 0.483 0.431 0.432 0.450
TranceptEVE L 0.471 0.498 0.473 0.453 0.474
Inverse Folding ESM-IF1 0.415 0.497 0.507 0.374 0.448
MIF-ST 0.404 0.415 0.463 0.396 0.420
ProteinMPNN 0.282 0.395 0.354 0.248 0.320
Table A9: ProteinGym - Zero-shot substitution DMS benchmark by mutational depth.

Average Spearman’s rank correlation between model scores and experimental measurements on the ProteinGym substitution benchmark, separated by mutational depths of 1,2,3,4, and 5 or more. The All column is the average across the 5 depths.

Model type Model name Spearman by Mutational Depth (↑)
1 2 3 4 5+ All
Alignment-based Site-Independent 0.336 0.235 0.226 0.267 0.350 0.283
WaveNet 0.176 0.059 0.218 0.181 0.258 0.178
EVmutation 0.376 0.274 0.324 0.301 0.394 0.334
DeepSequence (ens.) 0.405 0.264 0.313 0.309 0.378 0.334
EVE (ens.) 0.428 0.273 0.308 0.298 0.355 0.332
GEMME 0.447 0.274 0.321 0.324 0.414 0.356
Protein language UniRep 0.175 0.071 0.111 0.141 0.191 0.138
CARP (640M) 0.390 0.213 0.187 0.164 0.162 0.223
ESM-1b 0.384 0.227 0.187 0.149 0.270 0.243
ESM-2 (15B) 0.407 0.204 0.239 0.172 0.234 0.251
RITA XL 0.356 0.139 0.136 0.154 0.233 0.204
ESM-1v (ens.) 0.403 0.221 0.186 0.151 0.203 0.233
ProGen2 XL 0.385 0.184 0.280 0.219 0.280 0.270
VESPA 0.434 0.183 0.357 0.302 0.328 0.321
Hybrid UniRep evotuned 0.319 0.154 0.250 0.226 0.294 0.249
MSA Transformer (ens.) 0.426 0.238 0.384 0.366 0.408 0.364
Tranception L 0.423 0.258 0.352 0.318 0.387 0.348
TranceptEVE L 0.446 0.280 0.350 0.320 0.382 0.356
Inverse Folding ESM-IF1 0.439 0.345 0.290 0.289 0.358 0.344
MIF-ST 0.430 0.265 0.334 0.298 0.298 0.325
ProteinMPNN 0.292 0.257 0.171 0.186 0.278 0.237
Table A10: ProteinGym - Supervised substitution DMS benchmark.

Ranking based on Spearman’s rank correlation between experimental assay measurement and model prediction.

Ranking Model name Model type Spearman
1 ProteinNPT NPT 0.613
2 Tranception Embeddings 0.571
3 MSA Transformer Embeddings 0.568
4 ESM-1v Embeddings 0.542
5 TranceptEVE OHE 0.477
6 Tranception OHE 0.458
7 MSAT OHE 0.453
8 DeepSequence OHE 0.440
9 ESM-1v OHE 0.417
10 OHE w/o augmentation OHE 0.224
Table A11: Supervised substitution DMS benchmark by MSA depth.

Average Spearman’s rank correlation between model scores and experimental measurements by MSA depth on the ProteinGym substitution benchmark. Alignment depth is measured by the ratio of the effective number of sequences Neff in the MSA, following Hopf et al. [2017], by the length covered L (Low: Neff/L<1; Medium: 1<Neff/L<100; High: Neff/L>100)

Model type Model name Spearman by MSA depth (↑)
Low Medium High All
NPT ProteinNPT 0.701 0.587 0.608 0.632
Embeddings Tranception 0.621 0.556 0.561 0.579
MSAT 0.685 0.518 0.567 0.590
ESM-1v 0.653 0.465 0.541 0.553
One-hot encoding TranceptEVE 0.503 0.483 0.468 0.485
Tranception 0.490 0.455 0.445 0.463
MSAT 0.500 0.441 0.448 0.463
DeepSequence 0.482 0.422 0.426 0.443
ESM-1v 0.496 0.338 0.400 0.411
No Augmentation 0.246 0.204 0.227 0.226
Table A12: Supervised substitution DMS benchmark by function type.

Average Spearman’s rank correlation between supervised model scores and experimental measurements on the ProteinGym substitution benchmark, separated into five functional categories. Assays are split into one of Activity, Binding, Organismal Fitness, Stability and Expression. The All column is the average of all the categories

Model type Model name Spearman by Function Type (↑)
Activity Binding Expression Organismal Fitness Stability All
NPT ProteinNPT 0.577 0.536 0.637 0.545 0.772 0.613
Embeddings Tranception 0.520 0.529 0.613 0.519 0.674 0.571
MSAT 0.547 0.470 0.584 0.493 0.749 0.569
ESM-1v 0.487 0.450 0.587 0.468 0.717 0.542
One-hot encoding TranceptEVE 0.502 0.444 0.476 0.470 0.493 0.477
Tranception 0.475 0.416 0.476 0.448 0.473 0.458
MSAT 0.480 0.393 0.463 0.437 0.491 0.453
DeepSequence 0.467 0.418 0.424 0.422 0.471 0.440
ESM-1v 0.421 0.363 0.452 0.383 0.463 0.416
No Augmentation 0.213 0.212 0.226 0.194 0.273 0.224
Table A13: Supervised substitution DMS benchmark by taxa.

Average Spearman’s rank correlation between model scores and experimental measurements on the ProteinGym substitution benchmark, separated into four taxon categories. Assays are split into one of Human, Prokaryote, Other Eukaryote, or Virus. The All column is the average across the categories.

Model type Model name Spearman by Taxa (↑)
Human Other Eukaryote Prokaryote Virus All
NPT ProteinNPT 0.649 0.628 0.668 0.580 0.631
Embeddings Tranception 0.569 0.582 0.594 0.568 0.578
MSAT 0.634 0.579 0.648 0.521 0.596
ESM-1v 0.565 0.579 0.617 0.433 0.548
One-hot encoding TranceptEVE 0.481 0.490 0.475 0.478 0.481
Tranception 0.457 0.472 0.453 0.456 0.460
MSAT 0.482 0.459 0.468 0.448 0.464
DeepSequence 0.451 0.460 0.455 0.383 0.437
ESM-1v 0.426 0.444 0.452 0.292 0.404
No Augmentation 0.236 0.217 0.233 0.238 0.231
Table A14: ProteinGym - Zero-shot indel DMS benchmark.

Ranking based on Spearman’s rank correlation between experimental assay measurement and model prediction. The standard error reported corresponds to the non-parametric bootstrap standard error of the difference between the Spearman performance of a given model and that of the best overall model (ie., TranceptEVE), computed over 10k bootstrap samples from the set of proteins in the ProteinGym DMS indel benchmark.

Ranking Model Type Spearman Std. error
1* TranceptEVE M Hybrid model 0.467 0.000
1* ProGen2 M Protein language model 0.467 0.054
3 ProGen2 Base Protein language model 0.466 0.051
4 RITA L Protein language model 0.459 0.039
5 TranceptEVE L Hybrid model 0.457 0.014
6 Tranception M no retrieval Protein language model 0.455 0.045
7 RITA XL Protein language model 0.452 0.038
8 ProGen2 L Protein language model 0.451 0.049
9 Tranception L no retrieval Protein language model 0.439 0.040
10 RITA M Protein language model 0.439 0.032
11 ProGen2 XL Protein language model 0.434 0.029
12 ProGen2 S Protein language model 0.425 0.049
13 Tranception S no retrieval Protein language model 0.412 0.045
14 Tranception L Hybrid model 0.399 0.032
15 RITA S Protein language model 0.399 0.036
16 Tranception M Hybrid model 0.398 0.029
17 Hidden Markov Model Alignment-based model 0.391 0.028
18 TranceptEVE S Hybrid model 0.361 0.032
19 PROVEAN Alignment-based model 0.351 0.029
20 Tranception S Hybrid model 0.344 0.031
21 WaveNet Alignment-based model 0.285 0.046
22 ProtGPT2 Protein language model 0.194 0.033
23 UniRep Protein language model 0.169 0.061
Table A15:

Spearman’s rank correlation between model scores and individual deep mutational scans of indels.

Dataset Tranception ProGen2 PROVEAN HMM WaveNet
M L M+Ret L+Ret M XLarge
DMS Assays
AAV 0.126 0.210 0.371 0.338 −0.100 0.167 0.177 0.057 −0.007
β-Lac 0.379 0.296 0.365 0.344 0.619 0.409 0.385 0.347 0.437
KCNJ2 0.412 0.391 0.437 0.440 0.432 0.386 0.386 0.368 0.408
MtrA 0.615 0.375 0.612 0.395 0.403 0.348 0.278 0.472 0.244
PTEN 0.700 0.563 0.678 0.546 0.552 0.402 0.237 0.668 0.697
TP53 0.579 0.395 0.536 0.362 0.428 0.354 0.273 0.482 0.031
Assays of Natural and Designed Sequences
AAV 0.362 0.691 0.677 0.709 −0.466 0.492 0.683 0.607 0.666
CM 0.219 0.223 0.344 0.326 0.380 0.379 0.372 0.398 0.438
HIS3 0.687 0.707 0.611 0.655 0.702 0.713 0.701 0.548 0.687
Table A16: ProteinGym - Supervised indel DMS benchmark.

Ranking based on Spearman’s rank correlation between experimental assay measurement and model prediction.

Ranking Model Type Spearman
1 ESM-1v Embeddings 0.752
2 Tranception Embeddings 0.735
3 MSAT Embeddings 0.689
Table A17: ProteinGym - Clinical substitution benchmark.

Ranking based on AUROC between model prediction and ClinVar benign/pathogenic annotation.

Ranking Model Type AUC
1 ClinPred Supervised 0.981
2 MetaRNN Supervised 0.977
3 BayesDel (addAF) Supervised 0.972
4 VEST4 Supervised 0.929
5 REVEL Supervised 0.928
6 BayesDel (noAF) Supervised 0.925
7 VARITY (R) Supervised 0.921
8 TranceptEVE Unsupervised 0.920
9 GEMME Unsupervised 0.919
10 VARITY (ER) Supervised 0.918
11 EVE Unsupervised 0.917
12 gMVP Supervised 0.914
13 CADD Supervised 0.905
14 PolyPhen2 (HVAR) Supervised 0.896
15 DEOGEN2 Supervised 0.894
16 ESM-1b Unsupervised 0.892
17 PROVEAN Unsupervised 0.886
18 MPC Supervised 0.881
19 PolyPhen2 (HDIV) Supervised 0.879
20 SIFT Unsupervised 0.878
21 SIFT4G Unsupervised 0.877
22 MutationAssessor Unsupervised 0.877
23 MutPred Unsupervised 0.875
24 PrimateAI Unsupervised 0.855
25 LIST-S2 Unsupervised 0.842
26 MutationTaster Supervised 0.816
27 DANN Supervised 0.812
28 LRT Unsupervised 0.805
29 FATHMM Supervised 0.723
Table A18: ClinVar AUROC and Average Precisions.

Results for indel-compatible baselines on our ClinVar/gnomAD indel benchmark. AUPRC is area under the precision recall curve, and AUROC is area under the receiver-operating characteristic curve.

Model Type Model Name AUROC(↑) AUPRC (↑)
Alignment-based models HMM 0.679 0.775
PROVEAN 0.926 0.947
WaveNet
Protein language models UniRep 0.395 0.600
RITA XL 0.923 0.954
ProGen2 XL 0.846 0.889
Tranception L (no retrieval) 0.877 0.938
Tranception M (no retrieval) 0.858 0.929
ProtGPT2 0.655 0.779
Hybrid models Tranception L 0.857 0.920
Tranception M 0.844 0.909
TranceptEVE 0.857 0.916
Table A19: List of substitution datasets.

See the reference file in the GitHub repo for other info (UniProt ID, taxon, DOI, and more assay details).

Dataset Reference
β-Lactamase Jacquier et al. [2013]
β-Lactamase Stiffler et al. [2015]
β-Lactamase Firnberg et al. [2014]
β-Lactamase Deng et al. [2012]
β-Lactamase VIM-2 Chen et al. [2020]
β-Glucosidase Romero et al. [2015]
AAV Sinai et al. [2021]
ACE2 Chan et al. [2020]
ADRB2 Jones et al. [2020]
APH(3’)II, neo Melnikov et al. [2014]
APP Seuma et al. [2021]
Activation-induced deaminase Gajula et al. [2014]
Aliphatic amidase Wrenbeck et al. [2017]
Alpha-synuclein Newberry et al. [2020]
Amyloid β Gray et al. [2019]
Amyloid β Seuma et al. [2022]
Ancestral spleen tyrosine kinase Hobbs et al. [2022]
Anti-CRISPR protein AcrIIA4 Stadelmann et al. [2021]
Antitoxin ParD3 Ding et al. [2023]
Antitoxin ParD3 Aakre et al. [2015]
Arrestin-1 Ostermaier et al. [2014]
BRCA1 Findlay et al. [2018]
BRCA2 Erwood et al. [2022]
CALM1 Weile et al. [2017]
CARD11 Meitlis et al. [2020]
CASP3 Roychowdhury and Romero [2022]
CASP7 Roychowdhury and Romero [2022]
CBS (cystathionine beta-synthase) Sun et al. [2020]
CCR5 Gill et al. [2023]
CD19 Klesmith et al. [2019]
CVB3 capsid Mattenberger et al. [2021]
CXCR4 Gill et al. [2023]
Chalcone synthase Wrenbeck et al. [2019]
Cytochrome P450 2C9 Amorosi et al. [2021]
Cytochrome P450 2C9 Amorosi et al. [2021]
D-amino acid oxidase Vanella et al. [2023]
DHFR reductase Nguyen et al. [2023a]
DHFR reductase Thompson et al. [2020]
DNA methylase HaeIII Rockah-Shmuel et al. [2015]
Dengue virus NS5 Suphatrakul et al. [2023]
Dlg4, (PSD95_PDZ3) McLaughlin et al. [2012]
EfrC Meier et al. [2023]
EfrD Meier et al. [2023]
EnvZ Ghose et al. [2023]
ErbB2 membrane domain Elazar et al. [2016]
EstA Nutschel et al. [2020]
GAL4 Kitzman et al. [2015]
GB1 Wu et al. [2016]
GB1 Olson et al. [2014]
GDI1 Silverstein et al. [2021]
GFP Sarkisyan et al. [2016]
GMR (aacC1) Dandage et al. [2018]
GRB2-SH3 Faure et al. [2022]
Gcn4 Staller et al. [2018]
Glucokinase regulatory protein Gersing et al. [2023]
Glucokinase regulatory protein Gersing et al. [2022]
Glycophorin A membrane domain Elazar et al. [2016]
Green fluorescent protein amacGFP Gonzalez Somermeyer et al. [2022]
Green fluorescent protein cgreGFP Gonzalez Somermeyer et al. [2022]
Green fluorescent protein ppluGFP2 Gonzalez Somermeyer et al. [2022]
HIV env Duenas-Decamp et al. [2016]
HIV env Haddox et al. [2018]
HIV env (BF520) Haddox et al. [2018]
HIV env (BG505) Haddox et al. [2018]
HIV rev Fernandes et al. [2016]
HIV tat Fernandes et al. [2016]
HMG-CoA reductase Jiang [2019]
HRAS Bandaru et al. [2017]
HSP82 Flynn et al. [2020]
HSP82 Mishra et al. [2016]
Hsp90 Hietpas et al. [2011]
Hydroxymethylbilane synthase van Loggerenberg et al. [2023]
IGP dehydratase (HIS3) Pokusaeva et al. [2019]
InfA Kelsic et al. [2016]
Influenza H3N2 neuraminidase Lei et al. [2023]
Influenza M1 matrix protein Hom et al. [2019]
Influenza RNA polymerase PB1 Li et al. [2023]
Influenza hemagglutinin Thyagarajan and Bloom [2014]
Influenza hemagglutinin Doud and Bloom [2016]
Influenza hemagglutinin Wu et al. [2014]
Influenza hemagglutinin Lee et al. [2018]
Influenza neuraminidase Jiang et al. [2016]
Influenza nucleoprotein Bloom [2014]
Influenza nucleoprotein Doud et al. [2015]
Influenza nucleoprotein Doud and Bloom [2016]
Influenza polymerase acidic protein Wu et al. [2015]
Influenza polymerase basic protein 2 Soh et al. [2019]
KCNE1 Muhammad et al. [2023]
KCNH2 Kozek et al. [2020]
KCNJ2 Coyote-Maestas et al. [2022]
KRAS Weng et al. [2022]
KRAS Ursu et al. [2022]
L-selectin Elazar et al. [2016]
LGK (levoglucosan kinase) Wrenbeck et al. [2019]
LGK (levoglucosan kinase) Klesmith et al. [2015]
LamB Andrews and Fields [2020]
Leucine-rich repeat protein SHOC-2 Kwon et al. [2022]
MAPK1 Brenan et al. [2016]
MET kinase Estevam et al. [2023]
MPL Bridgford et al. [2020]
MSH2 Jia et al. [2021]
MTHFR reductase Weile et al. [2021]
MlaC MacRae et al. [2023]
NPC intracellular cholesterol transporter Erwood et al. [2022]
NPC intracellular cholesterol transporter Erwood et al. [2022]
NS5A Qi et al. [2014]
NUDT15 Suiter et al. [2020]
OCT1 (SLC22A1) Yee et al. [2023]
Ornithine transcarbamylase Lo et al. [2023]
p53 Giacomelli et al. [2018]
p53 Kotler et al. [2018]
PAB1 Melamed et al. [2013]
PPARG UK Monogenic Diabetes Consortium et al. [2016]
PSD95-PDZ3 Faure et al. [2022]
PTEN Matreyek et al. [2021]
PTEN Mighell et al. [2018]
Parkin Clausen et al. [2023]
Phosphoserine aminotransferase Xie et al. [2023]
Phototropin Chen et al. [2023]
Pilin (PilE) Kennouche et al. [2019]
Plasminogen activator inhibitor-1 Huttinger et al. [2021]
Protein phosphatase 1D Miller et al. [2022]
RAF oncogene Zinkus-Boltz et al. [2019]
RNAse III (rnc) Weeks and Ostermeier [2023]
Rhodopsin Wan et al. [2019]
SARS-CoV-2 Mpro Flynn et al. [2022]
SARS-CoV-2 Spike RBD Tan et al. [2023]
SARS-CoV2 Spike RBD Starr et al. [2020]
SCN5A Glazer et al. [2020]
SOX17 Veerapandian et al. [2018]
SOX2 Veerapandian et al. [2018]
SRC Ahler et al. [2019]
SUMO-conjugating enzyme UBC9 Weile et al. [2017]
Small ubiquitin-related modifier 1 Weile et al. [2017]
Sodium-dependent serotonin transporter Young et al. [2021]
Src Chakraborty et al. [2021]
Src Nguyen et al. [2023b]
Streptococcus pyogenes Cas9 Spencer and Zhang [2017]
TARDBP Bolognesi et al. [2019]
TIM Barrel (S. solfataricus) Chan et al. [2017]
TIM Barrel (T. maritima) Chan et al. [2017]
TIM Barrel (T. thermophilus) Chan et al. [2017]
Thiamin pyrophosphokinase 1 Weile et al. [2017]
Thiopurine S-methyltransferase (TPMT) Matreyek et al. [2018]
Toxin CcdB Tripathi et al. [2016]
Toxin CcdB Adkar et al. [2012]
Tsuboyama multi-DMS Tsuboyama et al. [2023]
Ube4b Starita et al. [2013]
Ubiquitin Roscoe et al. [2013]
Ubiquitin Roscoe and Bolon [2014]
Ubiquitin Mavor et al. [2016]
VKORC1 Chiasson et al. [2020]
VKORC1 Chiasson et al. [2020]
YAP1 Araya et al. [2012]
Zika virus env Sourisseau et al. [2019]
Table A20:

List of indel datasets.

Dataset Reference
β-Lactamase Gonzalez et al. [2019]
AAV Sinai et al. [2021]
Chorismate mutase (CM) Russ et al. [2020]
IGP dehydratase (HIS3) Pokusaeva et al. [2019]
Kir2.1 Macdonald et al. [2023]
MtrA Campbell et al. [2022]
p53 Kotler et al. [2018]
PTEN phosphatase Mighell et al. [2018]
amyloid β Seuma et al. [2022]
OCT1 (SLC22A1) Yee et al. [2023]
Tsuboyama multi-DMS Tsuboyama et al. [2023]

Contributor Information

Pascal Notin, Computer Science, University of Oxford.

Aaron W. Kollasch, Systems Biology, Harvard Medical School.

Daniel Ritter, Systems Biology, Harvard Medical School.

Lood van Niekerk, Systems Biology, Harvard Medical School.

Steffanie Paul, Systems Biology, Harvard Medical School.

Hansen Spinner, Systems Biology, Harvard Medical School.

Nathan Rollins, Seismic Therapeutic.

Ada Shaw, Applied Mathematics, Harvard University.

Ruben Weitzman, Computer Science, University of Oxford.

Jonathan Frazer, Centre for Genomic Regulation, Universitat Pompeu Fabra.

Mafalda Dias, Centre for Genomic Regulation, Universitat Pompeu Fabra.

Dinko Franceschi, Systems Biology, Harvard Medical School.

Rose Orenbuch, Systems Biology, Harvard Medical School.

Yarin Gal, Computer Science, University of Oxford.

Debora S. Marks, Harvard Medical School, Broad Institute

References

  1. Aakre Christopher D., Herrou Julien, Phung Tuyen N., Perchuk Barrett S., Crosson Sean, and Laub Michael T.. Evolving New Protein-Protein Interaction Specificity through Promiscuous Intermediates. Cell, 163(3): 594–606, October 2015. ISSN 00928674. doi: 10.1016/j.cell.2015.09.055. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867415012726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Adkar Bharat V., Tripathi Arti, Sahoo Anusmita, Bajaj Kanika, Goswami Devrishi, Chakrabarti Purbani, Swarnkar Mohit K., Gokhale Rajesh S., and Varadarajan Raghavan. Protein Model Discrimination Using Mutational Sensitivity Derived from Deep Sequencing. Structure, 20(2):371–381, February 2012. ISSN 09692126. doi: 10.1016/j.str.2011.11.021. URL https://linkinghub.elsevier.com/retrieve/pii/S0969212612000068. [DOI] [PubMed] [Google Scholar]
  3. Adzhubei Ivan A, Schmidt Steffen, Peshkin Leonid, Ramensky Vasily E, Gerasimova Anna, Bork Peer, Kondrashov Alexey S, and Sunyaev Shamil R. A method and server for predicting damaging missense mutations. Nature Methods, 7(4):248–249, April 2010. ISSN 1548–7091, 1548–7105. doi: 10.1038/nmeth0410-248. URL http://www.nature.com/articles/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ahler Ethan, Register Ames C., Chakraborty Sujata, Fang Linglan, Dieter Emily M., Sitko Katherine A., Vidadala Rama Subba Rao, Trevillian Bridget M., Golkowski Martin, Gelman Hannah, Stephany Jason J., Rubin Alan F., Merritt Ethan A., Fowler Douglas M., and Maly Dustin J.. A Combined Approach Reveals a Regulatory Mechanism Coupling Src’s Kinase Activity, Localization, and Phosphotransferase-Independent Functions. Molecular Cell, 74(2):393–408.e20, April 2019. ISSN 10972765. doi: 10.1016/j.molcel.2019.02.003. URL https://linkinghub.elsevier.com/retrieve/pii/S1097276519300930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Alirezaie Najmeh, Kristin D Kernohan Taila Hartley, Majewski Jacek, and Hocking Toby Dylan. Clinpred: prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. The American Journal of Human Genetics, 103(4):474–483, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Alley Ethan C., Khimulya Grigory, Biswas Surojit, AlQuraishi Mohammed, and Church George M.. Unified rational protein engineering with sequence-based deep representation learning. Nature Methods, pages 1–8, 2019a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Alley Ethan C., Khimulya Grigory, Biswas Surojit, Mohammed AlQuraishi, and George M Church. Unified rational protein engineering with sequence-based deep representation learning. Nature methods, 16(12): 1315–1322, 2019b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Amorosi Clara J., Chiasson Melissa A., McDonald Matthew G., Wong Lai Hong, Sitko Katherine A., Boyle Gabriel, Kowalski John P., Rettie Allan E., Fowler Douglas M., and Dunham Maitreya J.. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. The American Journal of Human Genetics, 108(9):1735–1751, September 2021. ISSN 00029297. doi: 10.1016/j.ajhg.2021.07.001. URL https://linkinghub.elsevier.com/retrieve/pii/S000292972100269X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Andrews Bryan and Fields Stanley. Distinct patterns of mutational sensitivity for λ resistance and maltodextrin transport in escherichia coli LamB. Microbial Genomics, 6(4), April 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Araya Carlos L., Fowler Douglas M., Chen Wentao, Muniez Ike, Kelly Jeffery W., and Fields Stanley. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proceedings of the National Academy of Sciences, 109(42):16858–16863, October 2012. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1209751109. URL https://pnas.org/doi/full/10.1073/pnas.1209751109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bandaru Pradeep, Shah Neel H, Bhattacharyya Moitrayee, Barton John P, Kondo Yasushi, Cofsky Joshua C, Gee Christine L, Chakraborty Arup K, Kortemme Tanja, Ranganathan Rama, and Kuriyan John. Deconstruction of the Ras switching cycle through saturation mutagenesis. eLife, 6:e27810, July 2017. ISSN 2050–084X. doi: 10.7554/eLife.27810. URL https://elifesciences.org/articles/27810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bloom Jesse D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Molecular Biology and Evolution, 31(8):1956–1978, August 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Bolognesi Benedetta, Faure Andre J., Seuma Mireia, Schmiedel Jörn M., Gian Gaetano Tartaglia, and Ben Lehner. The mutational landscape of a prion-like domain. Nature Communications, 10(1):4162, December 2019. ISSN 2041–1723. doi: 10.1038/s41467-019-12101-z. URL http://www.nature.com/articles/s41467-019-12101-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Boucher Jeffrey I, Bolon Daniel NA, and Tawfik Dan S. Quantifying and understanding the fitness effects of protein mutations: Laboratory versus nature. Protein Science, 25(7):1219–1226, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Brandes Nadav and Ntranos Vasilis. ESM variants - data & code for analysis and figures, June 2023. URL 10.5281/zenodo.8088402. [DOI] [Google Scholar]
  16. Brandes Nadav, Goldman Grant, Wang Charlotte H, Ye Chun Jimmie, and Ntranos Vasilis. Genome-wide prediction of disease variant effects with a deep protein language model. Nature Genetics, 55(9):1512–1522, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Brenan Lisa, Andreev Aleksandr, Cohen Ofir, Pantel Sasha, Kamburov Atanas, Cacchiarelli Davide, Persky Nicole S., Zhu Cong, Bagul Mukta, Goetz Eva M., Burgin Alex B., Garraway Levi A., Getz Gad, Mikkelsen Tarjei S., Piccioni Federica, Root David E., and Johannessen Cory M.. Phenotypic Characterization of a Comprehensive Set of MAPK1 /ERK2 Missense Mutants. Cell Reports, 17(4):1171–1183, October 2016. ISSN 22111247. doi: 10.1016/j.celrep.2016.09.061. URL https://linkinghub.elsevier.com/retrieve/pii/S2211124716313171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Bridgford Jessica L., Lee Su Min, Lee Christine M. M., Guglielmelli Paola, Rumi Elisa, Pietra Daniela, Wilcox Stephen, Chhabra Yash, Rubin Alan F., Cazzola Mario, Vannucchi Alessandro M., Brooks Andrew J., Call Matthew E., and Call Melissa J.. Novel drivers and modifiers of MPL-dependent oncogenic transformation identified by deep mutational scanning. Blood, 135(4):287–292, January 2020. ISSN 0006–4971, 1528–0020. doi: 10.1182/blood.2019002561. URL https://ashpublications.org/blood/article/135/4/287/381157/Novel-drivers-and-modifiers-of-MPLdependent. [DOI] [PubMed] [Google Scholar]
  19. Campbell Ian J, Atkinson Joshua T, Carpenter Matthew D, Myerscough Dru, Su Lin, Ajo-Franklin Caroline M, and Silberg Jonathan J. Determinants of multiheme cytochrome extracellular electron transfer uncovered by systematic peptide insertion. Biochemistry, 61(13):1337–1350, July 2022. [DOI] [PubMed] [Google Scholar]
  20. Capel Henriette, Weiler Robin, Dijkstra Maurits, Vleugels Reinier, Bloem Peter, and Feenstra K. Anton. ProteinGLUE multi-task benchmark suite for self-supervised protein modeling. Scientific Reports, 12(1): 16047, September 2022. ISSN 2045–2322. doi: 10.1038/s41598-022-19608-4. URL https://www.nature.com/articles/s41598-022-19608-4. Number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Carter Hannah, Douville Christopher, Stenson Peter D, Cooper David N, and Karchin Rachel. Identifying mendelian disease genes with the variant effect scoring tool. BMC genomics, 14(3):1–16, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Chakraborty Sujata, Ahler Ethan, Simon Jessica J, Fang Linglan, Potter Zachary E, Sitko Katherine A, Stephany Jason J, Guttman Miklos, Fowler Douglas M, and Maly Dustin J. Profiling of the drug resistance of thousands of src tyrosine kinase mutants uncovers a regulatory network that couples autoinhibition to catalytic domain dynamics. December 2021. [DOI] [PMC free article] [PubMed]
  23. Chan Kui K., Dorosky Danielle, Sharma Preeti, Abbasi Shawn A., Dye John M., Kranz David M., Herbert Andrew S., and Procko Erik. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science, 369(6508):1261–1265, September 2020. ISSN 0036–8075,1095–9203.doi: 10.1126/science.abc0870. URL https://www.science.org/doi/10.1126/science.abc0870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Chan Yvonne H., Venev Sergey V., Zeldovich Konstantin B., and Matthews C. Robert. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints. Nature Communications, 8(1):14614, April 2017. ISSN 2041–1723. doi: 10.1038/ncomms14614. URL http://www.nature.com/articles/ncomms14614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Chen John Z, Fowler Douglas M, and Tokuriki Nobuhiko. Comprehensive exploration of the translocation, stability and substrate recognition requirements in VIM-2 lactamase. eLife, 9:e56707, June 2020. ISSN 2050–084X. doi: 10.7554/eLife.56707. URL https://elifesciences.org/articles/56707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Chen Tianlong, Gong Chengyue, Daniel Jesus Diaz Xuxi Chen, Jordan Tyler Wells Qiang Liu, Wang Zhangyang, Ellington Andrew, Dimakis Alex, and Klivans Adam. HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing. October 2022. URL https://openreview.net/forum?id=RtV_iEbWeGE.
  27. Chen Yongcan, Hu Ruyun, Li Keyi, Zhang Yating, Fu Lihao, Zhang Jianzhi, and Si Tong. Deep mutational scanning of an Oxygen-Independent fluorescent protein CreiLOV for comprehensive profiling of mutational and epistatic effects. ACS Synthetic Biology, 12(5):1461–1473, May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Chiasson Melissa A, Rollins Nathan J, Stephany Jason J, Sitko Katherine A, Matreyek Kenneth A, Verby Marta, Sun Song, Roth Frederick P, DeSloover Daniel, Marks Debora S, Rettie Allan E, and Fowler Douglas M. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. eLife, 9:e58026, September 2020. ISSN 2050–084X. doi: 10.7554/eLife.58026. URL https://elifesciences.org/articles/58026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Choi Yongwook, Sims Gregory E, Murphy Sean, Miller Jason R, and Chan Agnes P. Predicting the functional effect of amino acid substitutions and indels. PLoS One, 7(10):e46688, October 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Chun Sung and Fay Justin C. Identification of deleterious mutations within three human genomes. Genome research, 19(9):1553–1561, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Clausen Lene, Voutsinos Vasileios, Cagiada Matteo, Johansson Kristoffer E, Grønbæk-Thygesen Martin, Nariya Snehal, Powell Rachel L, Have Magnus K N, Oestergaard Vibe H, Stein Amelie, Fowler Douglas M, Lindorff-Larsen Kresten, and Hartmann-Petersen Rasmus. A mutational atlas for parkin proteostasis. June 2023. [DOI] [PMC free article] [PubMed]
  32. Coyote-Maestas Willow, Nedrud David, He Yungui, and Schmidt Daniel. Determinants of trafficking, conduction, and disease within a K+ channel revealed through multiparametric deep mutational scanning. eLife, 11: e76903, May 2022. ISSN 2050–084X. doi: 10.7554/eLife.76903. URL https://elifesciences.org/articles/76903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Dallago Christian, Mou Jody, Johnston Kadina E, Wittmann Bruce J, Bhattacharya Nicholas, Goldman Samuel, Madani Ali, and Yang Kevin K. FLIP: Benchmark tasks in fitness landscape inference for proteins. 2021.
  34. Dandage Rohan, Pandey Rajesh, Jayaraj Gopal, Rai Manish, Berger David, and Chakraborty Kausik. Differential strengths of molecular determinants guide environment specific mutational fates. PLOS Genetics, 14(5): e1007419, May 2018. ISSN 1553–7404. doi: 10.1371/journal.pgen.1007419. URL https://dx.plos.org/10.1371/journal.pgen.1007419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte R J, Milles L F, Wicky B I M, Courbet A, de Haas R J, Bethel N, Leung P J Y, Huddy T F, Pellock S, Tischer D, Chan F, Koepnick B, Nguyen H, Kang A, Sankaran B, Bera A K, King N P, and Baker D. Robust deep learning-based protein sequence design using ProteinMPNN. Science, 378(6615):49–56, October 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Dawson Natalie L., Lewis Tony E., Das Sayoni, Lees Jonathan G., Lee David A., Ashford Paul, Orengo Christine A., and Sillitoe Ian P. W.. Cath: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Research, 45:D289 – D295, 2016. URL https://api.semanticscholar.org/CorpusID:9356024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Deng Zhifeng, Huang Wanzhi, Bakkalbasi Erol, Brown Nicholas G., Adamski Carolyn J., Rice Kacie, Muzny Donna, Gibbs Richard A., and Palzkill Timothy. Deep Sequencing of Systematic Combinatorial Libraries Reveals beta-Lactamase Sequence Constraints at High Resolution. Journal of Molecular Biology, 424(3–4): 150–167, December 2012. ISSN 00222836. doi: 10.1016/j.jmb.2012.09.014. URL https://linkinghub.elsevier.com/retrieve/pii/S0022283612007711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. BERT: Pre-training of deep bidirectional transformers for language understanding, 2019.
  39. Ding David, Shaw Ada, Sinai Sam, Rollins Nathan, Prywes Noam, Savage David F, Laub Michael T, and Marks Debora S. Protein design using structure-based residue preferences. June 2023. [DOI] [PMC free article] [PubMed]
  40. Doud Michael and Bloom Jesse. Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin. Viruses, 8(6):155, June 2016. ISSN 1999–4915. doi: 10.3390/v8060155. URL http://www.mdpi.com/1999-4915/8/6/155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Doud Michael B., Ashenberg Orr, and Bloom Jesse D.. Site-Specific Amino Acid Preferences Are Mostly Conserved in Two Closely Related Protein Homologs. Molecular Biology and Evolution, 32(11):2944–2960, November 2015. ISSN 0737–4038, 1537–1719. doi: 10.1093/molbev/msv167. URL https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msv167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Duenas-Decamp Maria, Jiang Li, Bolon Daniel, and Clapham Paul R.. Saturation Mutagenesis of the HIV-1 Envelope CD4 Binding Loop Reveals Residues Controlling Distinct Trimer Conformations. PLOS Pathogens, 12(11):e1005988, November 2016. ISSN 1553–7374. doi: 10.1371/journal.ppat.1005988. URL https://dx.plos.org/10.1371/journal.ppat.1005988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Durbin Richard, Eddy Sean, Krogh Anders, and Mitchison Graeme. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998. [Google Scholar]
  44. Eddy Sean R. Accelerated profile HMM searches. PLoS Comput. Biol., 7(10):e1002195, October 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Elazar Assaf, Weinstein Jonathan, Biran Ido, Fridman Yearit, Bibi Eitan, and Fleishman Sarel Jacob. Mutational scanning reveals the determinants of protein insertion and association energetics in the plasma membrane. eLife, 5:e12125, January 2016. ISSN 2050–084X. doi: 10.7554/eLife.12125. URL https://elifesciences.org/articles/12125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Elnaggar Ahmed, Heinzinger Michael, Dallago Christian, Rehawi Ghalia, Yu Wang, Jones Llion, Gibbs Tom, Fehér Tamas B., Angerer Christoph, Steinegger Martin, Bhowmik Debsindhu, and Rost Burkhard. Prottrans: Towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE transactions on pattern analysis and machine intelligence, PP, 2021. [DOI] [PubMed] [Google Scholar]
  47. Engelen Stefan, Trojan Ladislas A, Sacquin-Mora Sophie, Lavery Richard, and Carbone Alessandra. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS computational biology, 5(1):e1000267, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Erwood Steven, Bily Teija M. I., Lequyer Jason, Yan Joyce, Gulati Nitya, Brewer Reid A., Zhou Liangchi, Pelletier Laurence, Ivakine Evgueni A., and Cohn Ronald D.. Saturation variant interpretation using CRISPR prime editing. Nature Biotechnology, 40(6):885–895, June 2022. ISSN 1087–0156, 1546–1696. doi: 10.1038/s41587-021-01201-1. URL https://www.nature.com/articles/s41587-021-01201-1. [DOI] [PubMed] [Google Scholar]
  49. Estevam Gabriella O., Linossi Edmond M., Macdonald Christian B., Espinoza Carla A., Michaud Jennifer M., Coyote-Maestas Willow, Collisson Eric A., Jura Natalia, and Fraser James S.. Conserved regulatory motifs in the juxtamembrane domain and kinase N-lobe revealed through deep mutational scanning of the MET receptor tyrosine kinase domain. preprint, Molecular Biology, August 2023. URL http://biorxiv.org/lookup/doi/10.1101/2023.08.03.551866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Faure Andre J., Domingo Júlia, Schmiedel Jörn M., Hidalgo-Carcedo Cristina, Diss Guillaume, and Lehner Ben. Mapping the energetic and allosteric landscapes of protein binding domains. Nature, 604(7904): 175–183, April 2022. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-022-04586-4. URL https://www.nature.com/articles/s41586-022-04586-4. [DOI] [PubMed] [Google Scholar]
  51. Feng Bing-Jian. PERCH: a unified framework for disease gene prioritization. Human mutation, 38(3):243–251, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Fernandes Jason D., Faust Tyler B., Strauli Nicolas B., Smith Cynthia, Crosby David C., Nakamura Robert L., Hernandez Ryan D., and Frankel Alan D.. Functional Segregation of Overlapping Genes in HIV. Cell, 167(7):1762–1773.e12, December 2016. ISSN 00928674. doi: 10.1016/j.cell.2016.11.031. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867416316038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Ferruz Noelia, Schmidt Steffen, and Höcker Birte. ProtGPT2 is a deep unsupervised language model for protein design. Nature Communications, 13, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Findlay Gregory M., Daza Riza M., Martin Beth, Zhang Melissa D., Leith Anh P., Gasperini Molly, Janizek Joseph D., Huang Xingfan, Starita Lea M., and Shendure Jay. Accurate classification of BRCA1 variants with saturation genome editing. Nature, 562(7726):217–222, October 2018. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-018-0461-z. URL http://www.nature.com/articles/s41586-018-0461-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Firnberg Elad, Labonte Jason W., Gray Jeffrey J., and Ostermeier Marc. A Comprehensive, High-Resolution Map of a Gene’s Fitness Landscape. Molecular Biology and Evolution, 31(6):1581–1592, June 2014. ISSN 1537–1719, 0737–4038. doi: 10.1093/molbev/msu081. URL https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msu081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Flynn Julia M, Rossouw Ammeret, Cote-Hammarlof Pamela, Fragata Inês, Mavor David, Hollins Carl, Bank Claudia, and Bolon Daniel Na. Comprehensive fitness maps of Hsp90 show widespread environmental dependence. eLife, 9:e53810, March 2020. ISSN 2050–084X. doi: 10.7554/eLife.53810. URL https://elifesciences.org/articles/53810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Flynn Julia M., Samant Neha, Schneider-Nachum Gily, Barkan David T., Yilmaz Nese Kurt, Schiffer Celia A., Moquin Stephanie A., Dovala Dustin, and Bolon Daniel N.A.. Comprehensive fitness landscape of SARS-CoV-2 M pro reveals insights into viral resistance mechanisms. preprint, Molecular Biology, January 2022. URL http://biorxiv.org/lookup/doi/10.1101/2022.01.26.477860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Frazer Jonathan, Notin Pascal, Dias Mafalda, Gomez Aidan, Min Joseph K, Brock Kelly P., Gal Yarin, and Marks Debora S.. Disease variant prediction with deep generative models of evolutionary data. Nature, 2021. [DOI] [PubMed] [Google Scholar]
  59. Gajula Kiran S, Huwe Peter J, Mo Charlie Y, Crawford Daniel J, Stivers James T, Radhakrishnan Ravi, and Kohli Rahul M. High-throughput mutagenesis reveals functional determinants for DNA targeting by activation-induced deaminase. Nucleic Acids Research, 42(15):9964–9975, September 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Gao Zhangyang, Tan Cheng, and Li Stan Z.. Pifold: Toward effective and efficient protein inverse folding. ArXiv, abs/2209.12643, 2022. URL https://api.semanticscholar.org/CorpusID:252596302. [Google Scholar]
  61. Gersing Sarah, Cagiada Matteo, Gebbia Marinella, Gjesing Anette P., Coté Atina G., Seesankar Gireesh, Li Roujia, Tabet Daniel, Stein Amelie, Gloyn Anna L., Hansen Torben, Roth Frederick P., Lindorff-Larsen Kresten, and Hartmann-Petersen Rasmus. A comprehensive map of human glucokinase variant activity. preprint, Genetics, May 2022. URL http://biorxiv.org/lookup/doi/10.1101/2022.05.04.490571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Gersing Sarah, Schulze Thea K, Cagiada Matteo, Stein Amelie, Roth Frederick P, Lindorff-Larsen Kresten, and Hartmann-Petersen Rasmus. Characterizing glucokinase variant mechanisms using a multiplexed abundance assay. bioRxiv, May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ghose Dia A, Przydzial Kaitlyn E, Mahoney Emily M, Keating Amy E, and Laub Michael T. Marginal specificity in protein interactions constrains evolution of a paralogous family. Proceedings of the National Academy of Sciences of the United States of America, 120(18):e2221163120, May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Giacomelli Andrew O., Yang Xiaoping, Lintner Robert E., McFarland James M., Duby Marc, Kim Jaegil, Howard Thomas P., Takeda David Y., Ly Seav Huong, Kim Eejung, Gannon Hugh S., Hurhula Brian, Sharpe Ted, Goodale Amy, Fritchman Briana, Steelman Scott, Vazquez Francisca, Tsherniak Aviad, Aguirre Andrew J., Doench John G., Piccioni Federica, Roberts Charles W. M., Meyerson Matthew, Getz Gad, Johannessen Cory M., Root David E., and Hahn William C.. Mutational processes shape the landscape of TP53 mutations in human cancer. Nature Genetics, 50(10):1381–1387, October 2018. ISSN 1061–4036, 1546–1718. doi: 10.1038/s41588-018-0204-y. URL https://www.nature.com/articles/s41588-018-0204-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Gill Kevin S, Mehta Kritika, Heredia Jeremiah D, Krishnamurthy Vishnu V, Zhang Kai, and Procko Erik. Multiple mechanisms of self-association of chemokine receptors CXCR4 and CCR5 demonstrated by deep mutagenesis. bioRxiv, March 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Glazer Andrew M., Kroncke Brett M., Matreyek Kenneth A., Yang Tao, Wada Yuko, Shields Tiffany, Salem Joe-Elie, Fowler Douglas M., and Roden Dan M.. Deep Mutational Scan of an SCN5A Voltage Sensor. Circulation: Genomic and Precision Medicine, 13(1):e002786, February 2020. ISSN 2574–8300. doi: 10.1161/CIRCGEN.119.002786. URL https://www.ahajournals.org/doi/10.1161/CIRCGEN.119.002786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Gonzalez Courtney E., Roberts Paul, and Ostermeier Marc. Fitness Effects of Single Amino Acid Insertions and Deletions in TEM-1 beta-Lactamase. Journal of Molecular Biology, 431(12):2320–2330, May 2019. ISSN 00222836. doi: 10.1016/j.jmb.2019.04.030. URL https://linkinghub.elsevier.com/retrieve/pii/S0022283619302372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Gonzalez Somermeyer Louisa, Fleiss Aubin, Mishin Alexander S, Bozhanova Nina G, Igolkina Anna A, Meiler Jens, Alaball Pujol Maria-Elisenda, Putintseva Ekaterina V, Sarkisyan Karen S, and Kondrashov Fyodor A. Heterogeneity of the GFP fitness landscape and data-driven protein design. eLife, 11:e75842, May 2022. ISSN 2050–084X. doi: 10.7554/eLife.75842. URL https://elifesciences.org/articles/75842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Gray Vanessa E, Sitko Katherine, Ngako Kameni Floriane Z, Williamson Miriam, Stephany Jason J, Hasle Nicholas, and Fowler Douglas M. Elucidating the molecular determinants of Aβ aggregation with deep mutational scanning. G3, 9(11):3683–3689, November 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Grimm Dominik G, Azencott Chloé-Agathe, Aicheler Fabian, Gieraths Udo, MacArthur Daniel G, Samocha Kaitlin E, Cooper David N, Stenson Peter D, Daly Mark J, Smoller Jordan W, Duncan Laramie E, and Borgwardt Karsten M. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat., 36(5):513–523, May 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Haddox Hugh K, Dingens Adam S, Hilton Sarah K, Overbaugh Julie, and Bloom Jesse D. Mapping mutational effects along the evolutionary landscape of HIV envelope. eLife, 7:e34420, March 2018. ISSN 2050–084X. doi: 10.7554/eLife.34420. URL https://elifesciences.org/articles/34420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Heinzinger Michael, Elnaggar Ahmed, Wang Yu, Dallago Christian, Nechaev Dmitrii, Matthes Florian, and Rost Burkhard. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics, 20(1):723, December 2019. ISSN 1471–2105. doi: 10.1186/s12859-019-3220-8. URL 10.1186/s12859-019-3220-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Hesslow Daniel, Zanichelli N. ed., Notin Pascal, Poli Iacopo, and Marks Debora S.. RITA: a study on scaling up generative protein sequence models. ArXiv, abs/2205.05789, 2022. [Google Scholar]
  74. Hietpas Ryan T, Jensen Jeffrey D, and Bolon Daniel N A. Experimental illumination of a fitness landscape. Proceedings of the National Academy of Sciences of the United States of America, 108(19):7896–7901, May 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Ho Jonathan, Kalchbrenner Nal, Weissenborn Dirk, and Salimans Tim. Axial attention in multidimensional transformers. ArXiv, abs/1912.12180, 2019a. URL https://api.semanticscholar.org/CorpusID:209323787. [Google Scholar]
  76. Ho Jonathan, Kalchbrenner Nal, Weissenborn Dirk, and Salimans Tim. Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180, 2019b. [Google Scholar]
  77. Hobbs Helen T., Shah Neel H., Shoemaker Sophie R., Amacher Jeanine F., Marqusee Susan, and Kuriyan John. Saturation mutagenesis of a predicted ancestral Syk-family kinase. Protein Science, 31(10), October 2022. ISSN 0961–8368, 1469–896X. doi: 10.1002/pro.4411. URL https://onlinelibrary.wiley.com/doi/10.1002/pro.4411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Hom Nancy, Gentles Lauren, Bloom Jesse D, and Lee Kelly K. Deep mutational scan of the highly conserved influenza a virus M1 matrix protein reveals substantial intrinsic mutational tolerance. Journal of Virology, 93 (13), July 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Hon Jirí, Marusiak Martin, Martínek Tomáš, Kunka Antonin, Zendulka Jaroslav, Bednář David, and Damborský Jiří. SoluProt: prediction of soluble protein expression in escherichia coli. Bioinformatics, 37:23 – 28, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Hopf Thomas A, Ingraham John B, Poelwijk Frank J, Schärfe Charlotta PI, Springer Michael, Sander Chris, and Marks Debora S. Mutation effects predicted from sequence co-variation. Nature biotechnology, 35(2): 128–135, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Hsu Chloe, Nisonoff Hunter, Fannjiang Clara, and Listgarten Jennifer. Learning protein fitness models from evolutionary and assay-labeled data. Nature Biotechnology, 40(7):1114–1122, July 2022a. ISSN 1087–0156, 1546–1696. doi: 10.1038/s41587-021-01146-5. URL https://www.nature.com/articles/s41587-021-01146-5. [DOI] [PubMed] [Google Scholar]
  82. Hsu Chloe, Verkuil Robert, Liu Jason, Lin Zeming, Hie Brian, Sercu Tom, Lerer Adam, and Rives Alexander. Learning inverse folding from millions of predicted structures. April 2022b.
  83. Huang Po-Ssu, Boyken Scott E., and Baker David. The coming of age of de novo protein design. Nature, 537: 320–327, 2016. URL https://api.semanticscholar.org/CorpusID:205251398. [DOI] [PubMed] [Google Scholar]
  84. Huttinger Zachary M, Haynes Laura M, Yee Andrew, Kretz Colin A, Holding Matthew L, Siemieniak David R, Lawrence Daniel A, and Ginsburg David. Deep mutational scanning of the plasminogen activator inhibitor-1 functional landscape. Scientific Reports, 11(1):18827, September 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Ingraham John, Garg Vikas, Barzilay Regina, and Jaakkola Tommi. Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019. [Google Scholar]
  86. Ioannidis Nilah M, Rothstein Joseph H, Pejaver Vikas, Middha Sumit, McDonnell Shannon K, Baheti Saurabh, Musolf Anthony, Li Qing, Holzinger Emily, Karyadi Danielle, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. The American Journal of Human Genetics, 99(4): 877–885, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Jacquier Hervé, Birgy André, Hervé Le Nagard Yves Mechulam, Schmitt Emmanuelle, Glodt Jérémy, Bercot Beatrice, Petit Emmanuelle, Poulain Julie, Barnaud Guilène, Gros Pierre-Alexis, and Tenaillon Olivier. Capturing the mutational landscape of the beta-lactamase TEM-1. Proceedings of the National Academy of Sciences, 110(32):13067–13072, August 2013. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1215206110. URL https://pnas.org/doi/full/10.1073/pnas.1215206110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Jagota Milind, Ye Chengzhong, Rastogi Ruchir, Albors Carlos, Koehl Antoine, Ioannidis Nilah M., and Song Yun S.. Cross-protein transfer learning substantially improves zero-shot prediction of disease variant effects. 2022. URL https://api.semanticscholar.org/CorpusID:253628877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Jia Xiaoyan, Burugula Bala Bharathi, Chen Victor, Lemons Rosemary M., Jayakody Sajini, Maksutova Mariam, and Kitzman Jacob O.. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. The American Journal of Human Genetics, 108(1):163–175, January 2021. ISSN 00029297. doi: 10.1016/j.ajhg.2020.12.003. URL https://linkinghub.elsevier.com/retrieve/pii/S0002929720304390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Jiang Li, Liu Ping, Bank Claudia, Renzette Nicholas, Prachanronarong Kristina, Yilmaz Lutfu S., Caffrey Daniel R., Zeldovich Konstantin B., Schiffer Celia A., Kowalik Timothy F., Jensen Jeffrey D., Finberg Robert W., Wang Jennifer P., and Bolon Daniel N.A.. A Balance between Inhibitor Binding and Substrate Processing Confers Influenza Drug Resistance. Journal of Molecular Biology, 428(3):538–553, February 2016. ISSN 00222836. doi: 10.1016/j.jmb.2015.11.027. URL https://linkinghub.elsevier.com/retrieve/pii/S0022283615006907. [DOI] [PubMed] [Google Scholar]
  91. Jiang Rosanna Junchen. Exhaustive Mapping of Missense Variation in Coronary Heart Disease-related Genes. PhD thesis, University of Toronto, November 2019. URL https://hdl.handle.net/1807/98076. [Google Scholar]
  92. Jing Bowen, Eismann Stephan, Suriana Patricia, Townshend Raphael J L, and Dror Ron. Learning from protein structure with geometric vector perceptrons. September 2020.
  93. Jones Eric M, Lubock Nathan B, Venkatakrishnan Aj, Wang Jeffrey, Tseng Alex M, Paggi Joseph M, Latorraca Naomi R, Cancilla Daniel, Satyadi Megan, Davis Jessica E, Babu M Madan, Dror Ron O, and Kosuri Sriram. Structural and functional characterization of G protein–coupled receptors with deep mutational scanning. eLife, 9:e54895, October 2020. ISSN 2050–084X. doi: 10.7554/eLife.54895. URL https://elifesciences.org/articles/54895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Ronneberger Olaf, Tunyasuvunakool Kathryn, Bates Russ, Augustin Žídek Anna Potapenko, Bridgland Alex, Meyer Clemens, Kohl Simon A A, Ballard Andrew J, Cowie Andrew, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Adler Jonas, Back Trevor, Petersen Stig, Reiman David, Clancy Ellen, Zielinski Michal, Steinegger Martin, Pacholska Michalina, Berghammer Tamas, Bodenstein Sebastian, Silver David, Vinyals Oriol, Senior Andrew W, Kavukcuoglu Koray, Kohli Pushmeet, and Hassabis Demis. Highly accurate protein structure prediction with AlphaFold. Nature, July 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Karczewski Konrad J, Francioli Laurent C, Tiao Grace, Cummings Beryl B, Alföldi Jessica, Wang Qingbo, Collins Ryan L, Laricchia Kristen M, Ganna Andrea, Birnbaum Daniel P, Gauthier Laura D, Brand Harrison, Solomonson Matthew, Watts Nicholas A, Rhodes Daniel, Singer-Berk Moriel, England Eleina M, Seaby Eleanor G, Kosmicki Jack A, Walters Raymond K, Tashman Katherine, Farjoun Yossi, Banks Eric, Poterba Timothy, Wang Arcturus, Seed Cotton, Whiffin Nicola, Chong Jessica X, Samocha Kaitlin E, Pierce-Hoffman Emma, Zappala Zachary, O’Donnell-Luria Anne H, Minikel Eric Vallabh, Weisburd Ben, Lek Monkol, Ware James S, Vittal Christopher, Armean Irina M, Bergelson Louis, Cibulskis Kristian, Connolly Kristen M, Covarrubias Miguel, Donnelly Stacey, Ferriera Steven, Gabriel Stacey, Gentry Jeff, Gupta Namrata, Jeandet Thibault, Kaplan Diane, Llanwarne Christopher, Munshi Ruchi, Novod Sam, Petrillo Nikelle, Roazen David, Ruano-Rubio Valentin, Saltzman Andrea, Schleicher Molly, Soto Jose, Tibbetts Kathleen, Tolonen Charlotte, Wade Gordon, Talkowski Michael E, Genome Aggregation Database Consortium, Neale Benjamin M, Daly Mark J, and MacArthur Daniel G. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809):434–443, May 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Kelsic Eric D., Chung Hattie, Cohen Niv, Park Jimin, Wang Harris H., and Kishony Roy. RNA Structural Determinants of Optimal Codons Revealed by MAGE-Seq. Cell Systems, 3(6):563–571.e6, December 2016. ISSN 24054712. doi: 10.1016/j.cels.2016.11.004. URL https://linkinghub.elsevier.com/retrieve/pii/S2405471216303684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Kennouche Paul, Arthur Charles-Orszag Daiki Nishiguchi, Goussard Sylvie, Imhaus Anne-Flore, Dupré Mathieu, Chamot-Rooke Julia, and Duménil Guillaume. Deep mutational scanning of the Neisseria meningitidis major pilin reveals the importance of pilus tip-mediated adhesion. The EMBO Journal, 38(22):e102145, November 2019. ISSN 0261–4189, 1460–2075. doi: 10.15252/embj.2019102145. URL https://www.embopress.org/doi/10.15252/embj.2019102145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Kitzman Jacob O, Starita Lea M, Lo Russell S, Fields Stanley, and Shendure Jay. Massively parallel singleamino-acid mutagenesis. Nature Methods, 12(3):203–206, March 2015. ISSN 1548–7091, 1548–7105. doi: 10.1038/nmeth.3223. URL http://www.nature.com/articles/nmeth.3223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Klesmith Justin R., Bacik John-Paul, Michalczyk Ryszard, and Whitehead Timothy A.. ComprehensiveSequence-Flux Mapping of a Levoglucosan Utilization Pathway in E. coli. ACS Synthetic Biology, 4 (11):1235–1243, November 2015. ISSN 2161–5063, 2161–5063. doi: 10.1021/acssynbio.5b00131. URL https://pubs.acs.org/doi/10.1021/acssynbio.5b00131. [DOI] [PubMed] [Google Scholar]
  100. Klesmith Justin R., Su Lihe, Wu Lan, Schrack Ian A., Dufort Fay J., Birt Alyssa, Ambrose Christine, Hackel Benjamin J., Lobb Roy R., and Rennert Paul D.. Retargeting CD19 Chimeric Antigen Receptor T Cells via Engineered CD19-Fusion Proteins. Molecular Pharmaceutics, 16(8):3544–3558, August 2019. ISSN 1543–8384, 1543–8392. doi: 10.1021/acs.molpharmaceut.9b00418. URL https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.9b00418. [DOI] [PubMed] [Google Scholar]
  101. Kossen Jannik, Band Neil, Lyle Clare, Gomez Aidan N., Rainforth Tom, and Gal Yarin. Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning, February 2022. URL http://arxiv.org/abs/2106.02584. arXiv:2106.02584 [cs, stat] version: 2.
  102. Kotler Eran, Shani Odem, Goldfeld Guy, Maya Lotan-Pompan Ohad Tarcic, Gershoni Anat, Hopf Thomas A., Marks Debora S., Oren Moshe, and Segal Eran. A Systematic p53 Mutation Library Links Differential Functional Impact to Cancer Mutation Pattern and Evolutionary Conservation. Molecular Cell, 71(1):178–190.e8, July 2018. ISSN 10972765. doi: 10.1016/j.molcel.2018.06.012. URL https://linkinghub.elsevier.com/retrieve/pii/S1097276518304544. [DOI] [PubMed] [Google Scholar]
  103. Kozek Krystian A., Glazer Andrew M., Ng Chai-Ann, Blackwell Daniel, Egly Christian L., Vanags Loren R., Blair Marcia, Mitchell Devyn, Matreyek Kenneth A., Fowler Douglas M., Knollmann Bjorn C., Vandenberg Jamie I., Roden Dan M., and Kroncke Brett M.. High-throughput discovery of trafficking-deficient variants in the cardiac potassium channel KV11.1. Heart Rhythm, 17(12):2180–2189, December 2020. ISSN 15475271. doi: 10.1016/j.hrthm.2020.05.041. URL https://linkinghub.elsevier.com/retrieve/pii/S1547527120305427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Kryshtafovych Andriy, Schwede Torsten, Topf Maya, Fidelis Krzysztof, and Moult John. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins: Structure, 89:1607 – 1617, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Kwon Jason J., Hajian Behnoush, Bian Yuemin, Young Lucy C., Amor Alvaro J., Fuller James R., Fraley Cara V., Sykes Abbey M., So Jonathan, Pan Joshua, Baker Laura, Lee Sun Joo, Wheeler Douglas B., Mayhew David L., Persky Nicole S., Yang Xiaoping, Root David E., Barsotti Anthony M., Stamford Andrew W., Perry Charles K., Burgin Alex, McCormick Frank, Lemke Christopher T., Hahn William C., and Aguirre Andrew J.. Structure–function analysis of the SHOC2–MRAS–PP1C holophosphatase complex. Nature, 609 (7926):408–415, September 2022. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-022-04928-2. URL https://www.nature.com/articles/s41586-022-04928-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Laine Elodie, Karami Yasaman, and Carbone Alessandra. GEMME: A Simple and Fast Global Epistatic Model Predicting Mutational Effects. Molecular Biology and Evolution, 36(11):2604–2619, November 2019. ISSN 0737–4038. doi: 10.1093/molbev/msz179. URL 10.1093/molbev/msz179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Landrum Melissa J. and Kattman Brandi L.. ClinVar at five years: Delivering on the promise. Human Mutation, 39(11):1623–1630, November 2018. ISSN 1098–1004. doi: 10.1002/humu.23641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Lee Juhye M., Huddleston John, Doud Michael B., Hooper Kathryn A., Wu Nicholas C., Bedford Trevor, and Bloom Jesse D.. Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants. Proceedings of the National Academy of Sciences, 115(35), August 2018. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1806133115. URL https://pnas.org/doi/full/10.1073/pnas.1806133115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Lei Ruipeng, Garcia Andrea Hernandez, Tan Timothy J C, Teo Qi Wen, Wang Yiquan, Zhang Xiwen, Luo Shitong, Nair Satish K, Peng Jian, and Wu Nicholas C. Mutational fitness landscape of human influenza H3N2 neuraminidase. Cell Reports, 42(1):111951, January 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Li Biao, Krishnan Vidhya G, Mort Matthew E, Xin Fuxiao, Kamati Kishore K, Cooper David N, Mooney Sean D, and Radivojac Predrag. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics, 25(21):2744–2750, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Li Chang, Zhi Degui, Wang Kai, and Liu Xiaoming. MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Medicine, 14(1):115, October 2022. ISSN 1756–994X. doi: 10.1186/s13073-022-01120-z. URL https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-022-01120-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Li Yuan, Arcos Sarah, Sabsay Kimberly R., Te Velthuis Aartjan J.W., and Lauring Adam S.. Deep mutational scanning reveals the functional constraints and evolutionary potential of the influenza A virus PB1 protein. preprint, Microbiology, August 2023. URL http://biorxiv.org/lookup/doi/10.1101/2023.08.27.554986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Lin Zeming, Akin Halil, Rao Roshan, Hie Brian, Zhu Zhongkai, Lu Wenting, Smetanin Nikita, Verkuil Robert, Kabeli Ori, Shmueli Yaniv, dos Santos Costa Allan, Fazel-Zarandi Maryam, Sercu Tom, Candido Salvatore, and Rives Alexander. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, March 2023. doi: 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]
  114. Liu Xiaoming, Jian Xueqiu, and Boerwinkle Eric. dbnsfp: a lightweight database of human nonsynonymous snps and their functional predictions. Human mutation, 32(8):894–899, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Liu Xiaoming, Li Chang, Mou Chengcheng, Dong Yibo, and Tu Yicheng. dbnsfp v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site snvs. Genome medicine, 12(1):1–8, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Livesey Benjamin J and Marsh Joseph A. Updated benchmarking of variant effect predictors using deep mutational scanning. Molecular Systems Biology, page e11474, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Lo Russell S, Cromie Gareth A, Tang Michelle, Teng Kevin, Owens Katherine, Sirr Amy, Kutz J Nathan, Morizono Hiroki, Caldovic Ljubica, Mew Nicholas Ah, Gropman Andrea, and Dudley Aimée M. The functional impact of 1,570 individual amino acid substitutions in human OTC. American Journal of Human Genetics, 110(5):863–879, May 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Macdonald Christian B., Nedrud David, Grimes Patrick Rockefeller, Trinidad Donovan, Fraser James S., and Coyote-Maestas Willow. DIMPLE: deep insertion, deletion, and missense mutation libraries for exploring protein variation in evolution, disease, and biology. Genome Biology, 24(1):36, February 2023. ISSN 1474–760X. doi: 10.1186/s13059-023-02880-6. URL https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02880-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. MacRae Mark R, Puvanendran Dhenesh, Haase Max A B, Coudray Nicolas, Kolich Ljuvica, Lam Cherry, Baek Minkyung, Bhabha Gira, and Ekiert Damian C. Protein-protein interactions in the mla lipid transport system probed by computational structure prediction and deep mutational scanning. Journal of Biological Chemistry, 299(6):104744, June 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Madani Ali, Bryan McCann Nikhil Naik, Keskar Nitish Shirish, Anand Namrata, Eguchi Raphael R., Huang Po-Ssu, and Socher Richard. ProGen: Language modeling for protein generation, 2020.
  121. Malhis Nawar, Jacobson Matthew, Jones Steven JM, and Gsponer Jörg. LIST-S2: taxonomy based sorting of deleterious missense mutations across species. Nucleic acids research, 48(W1):W154–W161, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Marquet Céline, Heinzinger Michael, Olenyi Tobias, Dallago Christian, Erckert Kyra, Bernhofer Michael, Nechaev Dmitrii, and Rost Burkhard. Embeddings from protein language models predict conservation and variant effects. Human Genetics, 141(10):1629–1647, October 2022. ISSN 1432–1203. doi: 10.1007/s00439-021-02411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Matreyek Kenneth A., Starita Lea M., Stephany Jason J., Martin Beth, Chiasson Melissa A., Gray Vanessa E., Kircher Martin, Khechaduri Arineh, Dines Jennifer N., Hause Ronald J., Bhatia Smita, Evans William E., Relling Mary V., Yang Wenjian, Shendure Jay, and Fowler Douglas M.. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nature Genetics, 50(6):874–882, June 2018. ISSN 1061–4036, 1546–1718. doi: 10.1038/s41588-018-0122-z. URL https://www.nature.com/articles/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Matreyek Kenneth A., Stephany Jason J., Ahler Ethan, and Fowler Douglas M.. Integrating thousands of PTEN variant activity and abundance measurements reveals variant subgroups and new dominant negatives in cancers. Genome Medicine, 13(1):165, December 2021. ISSN 1756–994X. doi: 10.1186/s13073-021-00984-x. URL https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-021-00984-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Mattenberger Florian, Latorre Victor, Tirosh Omer, Stern Adi, and Geller Ron. Globally defining the effects of mutations in a picornavirus capsid. eLife, 10:e64256, January 2021. ISSN 2050–084X. doi: 10.7554/eLife.64256. URL https://elifesciences.org/articles/64256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Mavor David, Barlow Kyle, Thompson Samuel, Barad Benjamin A, Bonny Alain R, Cario Clinton L, Gaskins Garrett, Liu Zairan, Deming Laura, Axen Seth D, Caceres Elena, Chen Weilin, Cuesta Adolfo, Gate Rachel E, Green Evan M, Hulce Kaitlin R, Ji Weiyue, Kenner Lillian R, Mensa Bruk, Morinishi Leanna S, Moss Steven M, Mravic Marco, Muir Ryan K, Niekamp Stefan, Nnadi Chimno I, Palovcak Eugene, Poss Erin M, Ross Tyler D, Salcedo Eugenia C, See Stephanie K, Subramaniam Meena, Wong Allison W, Li Jennifer, Thorn Kurt S, Conchúir Shane Ó, Roscoe Benjamin P, Chow Eric D, DeRisi Joseph L, Kortemme Tanja, Bolon Daniel N, and Fraser James S. Determination of ubiquitin fitness landscapes under different chemical stresses in a classroom setting. eLife, 5:e15802, April 2016. ISSN 2050–084X. doi: 10.7554/eLife.15802. URL https://elifesciences.org/articles/15802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. McLaughlin Richard N. Jr., Poelwijk Frank J., Raman Arjun, Gosal Walraj S., and Ranganathan Rama. The spatial architecture of protein function and adaptation. Nature, 491(7422):138–142, November 2012. ISSN 0028–0836, 1476–4687. doi: 10.1038/nature11500. URL https://www.nature.com/articles/nature11500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Meier Gianmarco, Thavarasah Sujani, Ehrenbolger Kai, Hutter Cedric A J, Hürlimann Lea M, Barandun Jonas, and Seeger Markus A. Deep mutational scan of a drug efflux pump reveals its structure-function landscape. Nature Chemical Biology, 19(4):440–450, April 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  129. Meier Joshua, Rao Roshan, Verkuil Robert, Liu Jason, Sercu Tom, and Rives Alexander. Language models enable zero-shot prediction of the effects of mutations on protein function. bioRxiv, 2021. doi: 10.1101/2021.07.09.450648. URL https://www.biorxiv.org/content/early/2021/07/10/2021.07.09.450648. [DOI] [Google Scholar]
  130. Meitlis Iana, Allenspach Eric J., Bauman Bradly M., Phan Isabelle Q., Dabbah Gina, Schmitt Erica G., Camp Nathan D., Torgerson Troy R., Nickerson Deborah A., Bamshad Michael J., Hagin David, Luthers Christopher R., Stinson Jeffrey R., Gray Jessica, Lundgren Ingrid, Church Joseph A., Butte Manish J., Jordan Mike B., Aceves Seema S., Schwartz Daniella M., Milner Joshua D., Schuval Susan, Skoda-Smith Suzanne, Cooper Megan A., Starita Lea M., Rawlings David J., Snow Andrew L., and James Richard G.. Multiplexed Functional Assessment of Genetic Variants in CARD11. The American Journal of Human Genetics, 107(6):1029–1043, December 2020. ISSN 00029297. doi: 10.1016/j.ajhg.2020.10.015. URL https://linkinghub.elsevier.com/retrieve/pii/S0002929720303736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Melamed Daniel, Young David L., Gamble Caitlin E., Miller Christina R., and Fields Stanley. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA, 19(11): 1537–1551, November 2013. ISSN 1355–8382, 1469–9001. doi: 10.1261/rna.040709.113. URL http://rnajournal.cshlp.org/lookup/doi/10.1261/rna.040709.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Melnikov Alexandre, Rogov Peter, Wang Li, Gnirke Andreas, and Mikkelsen Tarjei S.. Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Research, 42(14):e112–e112, August 2014. ISSN 0305–1048, 1362–4962. doi: 10.1093/nar/gku511. URL https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gku511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Mighell Taylor L., Evans-Dutson Sara, and O’Roak Brian J.. A Saturation Mutagenesis Approach to Understanding PTEN Lipid Phosphatase Activity and Genotype-Phenotype Relationships. The American Journal of Human Genetics, 102(5):943–955, May 2018. ISSN 00029297. doi: 10.1016/j.ajhg.2018.03.018. URL https://linkinghub.elsevier.com/retrieve/pii/S0002929718301071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Miller Peter G., Sathappa Murugappan, Moroco Jamie A., Jiang Wei, Qian Yue, Iqbal Sumaiya, Guo Qi, Giacomelli Andrew O., Shaw Subrata, Vernier Camille, Bajrami Besnik, Yang Xiaoping, Raffier Cerise, Sperling Adam S., Gibson Christopher J., Kahn Josephine, Jin Cyrus, Ranaghan Matthew, Caliman Alisha, Brousseau Merissa, Fischer Eric S., Lintner Robert, Piccioni Federica, Campbell Arthur J., Root David E., Garvie Colin W., and Ebert Benjamin L.. Allosteric inhibition of PPM1D serine/threonine phosphatase via an altered conformational state. Nature Communications, 13(1):3778, June 2022. ISSN 2041–1723. doi: 10.1038/s41467-022-30463-9. URL https://www.nature.com/articles/s41467-022-30463-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Mishra Parul, Flynn Julia M., Starr Tyler N., and Bolon Daniel N.A.. Systematic Mutant Analyses Elucidate General and Client-Specific Aspects of Hsp90 Function. Cell Reports, 15(3):588–598, April 2016. ISSN 22111247. doi: 10.1016/j.celrep.2016.03.046. URL https://linkinghub.elsevier.com/retrieve/pii/S2211124716303175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Moal Iain H. and Fernández-Recio Juan. SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics, 28 20:2600–7, 2012. [DOI] [PubMed] [Google Scholar]
  137. Muhammad Ayesha, Maria E Calandranis Bian Li, Yang Tao, Blackwell Daniel J, Harvey M Lorena, Smith Jeremy E, Chew Ashli E, Capra John A, Matreyek Kenneth A, Fowler Douglas M, Roden Dan M, and Glazer Andrew M. High-throughput functional mapping of variants in an arrhythmia gene, KCNE1, reveals novel biology. bioRxiv, April 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Newberry Robert W., Arhar Taylor, Costello Jean, Hartoularos George C., Maxwell Alison M., Naing Zun Zar Chi, Pittman Maureen, Reddy Nishith R., Schwarz Daniel M. C., Wassarman Douglas R., Wu Taia S., Barrero Daniel, Caggiano Christa, Catching Adam, Cavazos Taylor B., Estes Laurel S., Faust Bryan, Fink Elissa A., Goldman Miriam A., Gomez Yessica K., Gordon M. Grace, Gunsalus Laura M., Hoppe Nick, Jaime-Garza Maru, Johnson Matthew C., Jones Matthew G., Kung Andrew F., Lopez Kyle E., Lumpe Jared, Martyn Calla, McCarthy Elizabeth E., Miller-Vedam Lakshmi E., Navarro Erik J., Palar Aji, Pellegrino Jenna, Saylor Wren, Stephens Christina A., Strickland Jack, Torosyan Hayarpi, Wankowicz Stephanie A., Wong Daniel R., Wong Garrett, Redding Sy, Chow Eric D., DeGrado William F., and Kampmann Martin. Robust Sequence Determinants of alpha-Synuclein Toxicity in Yeast Implicate Membrane Binding. ACS Chemical Biology, 15(8):2137–2153, August 2020. ISSN 1554–8929, 1554–8937. doi: 10.1021/acschembio.0c00339. URL https://pubs.acs.org/doi/10.1021/acschembio.0c00339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Ng Pauline C and Henikoff Steven. Accounting for human polymorphisms predicted to affect protein function. Genome Res., 12(3):436–446, March 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  140. Nguyen Thuy N, Ingle Christine, Thompson Samuel, and Reynolds Kimberly A. The genetic landscape of a metabolic interaction. May 2023a. [DOI] [PMC free article] [PubMed]
  141. Nguyen Vanessa, Ahler Ethan, Sitko Katherine A, Stephany Jason J, Maly Dustin J, and Fowler Douglas M. Molecular determinants of hsp90 dependence of src kinase revealed by deep mutational scanning. Protein Science, 32(7):e4656, July 2023b. [DOI] [PMC free article] [PubMed] [Google Scholar]
  142. Nijkamp Erik, Ruffolo Jeffrey A., Weinstein Eli N., Naik Nikhil, and Madani Ali. ProGen2: Exploring the boundaries of protein language models. ArXiv, abs/2206.13517, 2022. [DOI] [PubMed] [Google Scholar]
  143. Notin Pascal, Dias Mafalda, Frazer Jonathan, Marchena-Hurtado Javier, Gomez Aidan N., Marks Debora S., and Gal Yarin. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In ICML, 2022a. [Google Scholar]
  144. Notin Pascal, Van Niekerk Lood, Kollasch Aaron W., Ritter Daniel, Gal Yarin, and Marks Debora Susan. TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction. December 2022b. URL https://openreview.net/forum?id=l7Oo9DcLmR1. [Google Scholar]
  145. Notin Pascal, Weitzman Ruben, Marks Debora S., and Gal Yarin. Proteinnpt: Improving protein property prediction and design with non-parametric transformers. Advances in Neural Information Processing Systems, 37, 2023. [Google Scholar]
  146. Nutschel Christina, Fulton Alexander, Zimmermann Olav, Schwaneberg Ulrich, Jaeger Karl-Erich, and Gohlke Holger. Systematically Scrutinizing the Impact of Substitution Sites on Thermostability and Detergent Tolerance for Bacillus subtilis Lipase A. Journal of Chemical Information and Modeling, 60(3):1568–1584, March 2020. ISSN 1549–9596, 1549–960X. doi: 10.1021/acs.jcim.9b00954. URL https://pubs.acs.org/doi/10.1021/acs.jcim.9b00954. [DOI] [PubMed] [Google Scholar]
  147. Anders Olson C., Wu Nicholas C., and Sun Ren. A Comprehensive Biophysical Description of Pairwise Epistasis throughout an Entire Protein Domain. Current Biology, 24(22):2643–2651, November 2014. ISSN 09609822. doi: 10.1016/j.cub.2014.09.072. URL https://linkinghub.elsevier.com/retrieve/pii/S0960982214012688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Ostermaier Martin K, Peterhans Christian, Jaussi Rolf, Deupi Xavier, and Standfuss Jörg. Functional map of arrestin-1 at single amino acid resolution. Proceedings of the National Academy of Sciences, 111(5): 1825–1830, February 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Pejaver Vikas, Alicia B Byrne Bing-Jian Feng, Pagel Kymberleigh A, Mooney Sean D, Karchin Rachel, O’Donnell-Luria Anne, Harrison Steven M, Tavtigian Sean V, Greenblatt Marc S, et al. Calibration of computational tools for missense variant pathogenicity classification and clingen recommendations for pp3/bp4 criteria. The American Journal of Human Genetics, 109(12):2163–2177, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Pokusaeva Victoria O., Usmanova Dinara R., Putintseva Ekaterina V., Espinar Lorena, Sarkisyan Karen S., Mishin Alexander S., Bogatyreva Natalya S., Ivankov Dmitry N., Akopyan Arseniy V., Avvakumov Sergey Ya., Povolotskaya Inna S., Filion Guillaume J., Carey Lucas B., and Kondrashov Fyodor A.. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLOS Genetics, 15(4):e1008079, April 2019. ISSN 1553–7404. doi: 10.1371/journal.pgen.1008079. URL https://dx.plos.org/10.1371/journal.pgen.1008079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  151. Qi Hangfei, Olson C. Anders, Wu Nicholas C., Ke Ruian, Loverdo Claude, Chu Virginia, Truong Shawna, Remenyi Roland, Chen Zugen, Du Yushen, Su Sheng-Yao, Al-Mawsawi Laith Q., Wu Ting-Ting, Chen Shu-Hua, Lin Chung-Yen, Zhong Weidong, Lloyd-Smith James O., and Sun Ren. A Quantitative High-Resolution Genetic Profile Rapidly Identifies Sequence Determinants of Hepatitis C Viral Fitness and Drug Sensitivity. PLoS Pathogens, 10(4):e1004064, April 2014. ISSN 1553–7374. doi: 10.1371/journal.ppat.1004064. URL https://dx.plos.org/10.1371/journal.ppat.1004064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  152. Quang Daniel, Chen Yifei, and Xie Xiaohui. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics, 31(5):761–763, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  153. Radford Alec, Wu Jeff, Child Rewon, Luan David, Amodei Dario, and Sutskever Ilya. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
  154. Raimondi Daniele, Tanyalcin Ibrahim, Julien Ferté Andrea Gazzo, Orlando Gabriele, Lenaerts Tom, Rooman Marianne, and Vranken Wim. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic acids research, 45(W1):W201–W206, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  155. Rao Roshan, Bhattacharya Nicholas, Thomas Neil, Duan Yan, Chen Xi, Canny John, Abbeel Pieter, and Song Yun S.. Evaluating Protein Transfer Learning with TAPE, June 2019. URL http://arxiv.org/abs/1906.08230. arXiv:1906.08230 [cs, q-bio, stat]. [PMC free article] [PubMed]
  156. Rao Roshan, Liu Jason, Verkuil Robert, Meier Joshua, Canny John F., Abbeel Pieter, Sercu Tom, and Rives Alexander. MSA transformer. bioRxiv, 2021. doi: 10.1101/2021.02.12.430858. URL https://www.biorxiv.org/content/early/2021/02/13/2021.02.12.430858. [DOI] [Google Scholar]
  157. Rentzsch Philipp, Witten Daniela, Cooper Gregory M, Shendure Jay, and Kircher Martin. Cadd: predicting the deleteriousness of variants throughout the human genome. Nucleic acids research, 47(D1):D886–D894, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  158. Reva Boris, Antipin Yevgeniy, and Sander Chris. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research, 39(17):e118, September 2011. ISSN 1362–4962. doi: 10.1093/nar/gkr407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  159. Riesselman Adam J, Ingraham John B, and Marks Debora S. Deep generative models of genetic variation capture the effects of mutations. Nature Methods, 15(10):816–822, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  160. Rives Alexander, Meier Joshua, Sercu Tom, Goyal Siddharth, Lin Zeming, Liu Jason, Guo Demi, Ott Myle, Zitnick C Lawrence, Ma Jerry, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  161. Rockah-Shmuel Liat, Tóth-Petróczy Ágnes, and Tawfik Dan S.. Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations. PLOS Computational Biology, 11(8):e1004421, August 2015. ISSN 1553–7358. doi: 10.1371/journal.pcbi.1004421. URL https://dx.plos.org/10.1371/journal.pcbi.1004421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  162. Romero Philip A., Krause Andreas, and Arnold Frances H.. Navigating the protein fitness landscape with gaussian processes. Proceedings of the National Academy of Sciences, 110:E193 – E201, 2012. URL https://api.semanticscholar.org/CorpusID:1093192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  163. Romero Philip A., Tran Tuan M., and Abate Adam R.. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proceedings of the National Academy of Sciences, 112(23):7159–7164, June 2015. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1422285112. URL https://pnas.org/doi/full/10.1073/pnas.1422285112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  164. Roscoe Benjamin P. and Bolon Daniel N.A.. Systematic Exploration of Ubiquitin Sequence, E1 Activation Efficiency, and Experimental Fitness in Yeast. Journal of Molecular Biology, 426(15):2854–2870, July 2014. ISSN 00222836. doi: 10.1016/j.jmb.2014.05.019. URL https://linkinghub.elsevier.com/retrieve/pii/S0022283614002587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  165. Roscoe Benjamin P., Thayer Kelly M., Zeldovich Konstantin B., Fushman David, and Bolon Daniel N.A.. Analyses of the Effects of All Ubiquitin Point Mutants on Yeast Growth Rate. Journal of Molecular Biology, 425(8):1363–1377, April 2013. ISSN 00222836. doi: 10.1016/j.jmb.2013.01.032. URL https://linkinghub.elsevier.com/retrieve/pii/S0022283613000636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  166. Roychowdhury Hridindu and Romero Philip A. Microfluidic deep mutational scanning of the human executioner caspases reveals differences in structure and regulation. Cell Death Discovery, 8(1):7, January 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  167. Rubin Alan F., Min Joseph K, Rollins Nathan J., Da Estelle Y, Esposito Daniel, Harrington Matthew, Stone Jeremy, Bianchi Aisha Haley, Dias Mafalda, Frazer Jonathan, Fu Yunfan, Gallaher Molly, Li Iris, Moscatelli Olivia, Ong Jesslyn YL, Rollins Joshua E, Wakefield Matthew J., Ye Shenyi “Sunny”, Tam Amy Sze Pui, McEwen Abbye E., Starita Lea M., Bryant Vanessa L., Marks Debora S., and Fowler Douglas M.. MaveDB v2: a curated community database with over three million variant effects from multiplexed functional assays. bioRxiv, 2021. [Google Scholar]
  168. Russ William P., Figliuzzi Matteo, Stocker Christian, Barrat-Charlaix Pierre, Socolich Michael, Kast Peter, Hilvert Donald, Monasson Remi, Cocco Simona, Weigt Martin, and Ranganathan Rama. An evolution-based model for designing chorismate mutase enzymes. Science, 369(6502):440–445, July 2020. ISSN 0036–8075, 1095–9203. doi: 10.1126/science.aba3304. URL https://www.sciencemag.org/lookup/doi/10.1126/science.aba3304. [DOI] [PubMed] [Google Scholar]
  169. Samocha Kaitlin E, Kosmicki Jack A, Karczewski Konrad J, O’Donnell-Luria Anne H, Pierce-Hoffman Emma, MacArthur Daniel G, Neale Benjamin M, and Daly Mark J. Regional missense constraint improves variant deleteriousness prediction. BioRxiv, page 148353, 2017. [Google Scholar]
  170. Sarkisyan Karen S., Bolotin Dmitry A., Meer Margarita V., Usmanova Dinara R., Mishin Alexander S., Sharonov George V., Ivankov Dmitry N., Bozhanova Nina G., Baranov Mikhail S., Soylemez Onuralp, Bogatyreva Natalya S., Vlasov Peter K., Egorov Evgeny S., Logacheva Maria D., Kondrashov Alexey S., Chudakov Dmitry M., Putintseva Ekaterina V., Mamedov Ilgar Z., Tawfik Dan S., Lukyanov Konstantin A., and Kondrashov Fyodor A.. Local fitness landscape of the green fluorescent protein. Nature, 533(7603):397–401, May 2016. ISSN 0028–0836, 1476–4687. doi: 10.1038/nature17995. URL http://www.nature.com/articles/nature17995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  171. Schwarz Jana Marie, Rödelsperger Christian, Schuelke Markus, and Seelow Dominik. MutationTaster evaluates disease-causing potential of sequence alterations. Nature methods, 7(8):575–576, 2010. [DOI] [PubMed] [Google Scholar]
  172. Seuma Mireia, Faure Andre J, Badia Marta, Lehner Ben, and Bolognesi Benedetta. The genetic landscape for amyloid beta fibril nucleation accurately discriminates familial Alzheimer’s disease mutations. eLife, 10: e63364, February 2021. ISSN 2050–084X. doi: 10.7554/eLife.63364. URL https://elifesciences.org/articles/63364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  173. Seuma Mireia, Lehner Ben, and Bolognesi Benedetta. An atlas of amyloid aggregation: the impact of substitutions, insertions, deletions and truncations on amyloid beta fibril nucleation. Nature Communications, 13(1): 7084, November 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  174. Shihab Hashem A, Gough Julian, Cooper David N, Stenson Peter D, Barker Gary LA, Edwards Keith J, Day Ian NM, and Gaunt Tom R. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden markov models. Human mutation, 34(1):57–65, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  175. Shin Jung-Eun, Riesselman Adam J, Kollasch Aaron W, McMahon Conor, Simon Elana, Sander Chris, Manglik Aashish, Kruse Andrew C, and Marks Debora S. Protein design and variant prediction using autoregressive generative models. Nature communications, 12(1):1–11, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  176. Shortle D and Sondek J. The emerging role of insertions and deletions in protein engineering. Curr. Opin. Biotechnol., 6(4):387–393, August 1995. [DOI] [PubMed] [Google Scholar]
  177. Silverstein Rachel A., Sun Song, Verby Marta, Weile Jochen, Wu Yingzhou, Gebbia Marinella, Fotiadou Iosifina, Kitaygorodsky Julia, and Roth Frederick P.. A systematic genotype-phenotype map for missense variants in the human intellectual disability-associated gene GDI1. preprint, Genetics, October 2021. URL http://biorxiv.org/lookup/doi/10.1101/2021.10.06.463360. [Google Scholar]
  178. Sinai Sam, Jain Nina, Church George M, and Kelsic Eric D. Generative AAV capsid diversification by latent interpolation. preprint, Synthetic Biology, April 2021. URL http://biorxiv.org/lookup/doi/10.1101/2021.04.16.440236. [Google Scholar]
  179. Soh Yq Shirleen, Moncla Louise H, Eguia Rachel, Bedford Trevor, and Bloom Jesse D. Comprehensive mapping of adaptation of the avian influenza polymerase protein PB2 to humans. eLife, 8:e45079, April 2019. ISSN 2050–084X. doi: 10.7554/eLife.45079. URL https://elifesciences.org/articles/45079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  180. Sourisseau Marion, Lawrence Daniel J. P., Schwarz Megan C., Storrs Carina H., Veit Ethan C., Bloom Jesse D., and Evans Matthew J.. Deep Mutational Scanning Comprehensively Maps How Zika Envelope Protein Mutations Affect Viral Growth and Antibody Escape. Journal of Virology, 93(23):e01291–19, December 2019. ISSN 0022–538X, 1098–5514. doi: 10.1128/JVI.01291-19. URL https://journals.asm.org/doi/10.1128/JVI.01291-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  181. Spencer Jeffrey M. and Zhang Xiaoliu. Deep mutational scanning of S. pyogenes Cas9 reveals important functional domains. Scientific Reports, 7(1):16836, December 2017. ISSN 2045–2322. doi: 10.1038/s41598-017-17081-y. URL https://www.nature.com/articles/s41598-017-17081-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  182. Stadelmann Tobias, Heid Daniel, Jendrusch Michael, Mathony Jan, Rosset Stéphane, Correia Bruno E., and Niopek Dominik. A deep mutational scanning platform to characterize the fitness landscape of anti-CRISPR proteins. preprint, Synthetic Biology, August 2021. URL http://biorxiv.org/lookup/doi/10.1101/2021.08.21.457204. [Google Scholar]
  183. Staller Max V., Holehouse Alex S., Swain-Lenz Devjanee, Das Rahul K., Pappu Rohit V., and Cohen Barak A.. A High-Throughput Mutational Scan of an Intrinsically Disordered Acidic Transcriptional Activation Domain. Cell Systems, 6(4):444–455.e6, April 2018. ISSN 24054712. doi: 10.1016/j.cels.2018.01.015. URL https://linkinghub.elsevier.com/retrieve/pii/S2405471218300528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  184. Starita Lea M., Pruneda Jonathan N., Lo Russell S., Fowler Douglas M., Kim Helen J., Hiatt Joseph B., Shendure Jay, Brzovic Peter S., Fields Stanley, and Klevit Rachel E.. Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis. Proceedings of the National Academy of Sciences, 110(14), April 2013. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1303309110. URL https://pnas.org/doi/full/10.1073/pnas.1303309110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  185. Starr Tyler N., Greaney Allison J., Hilton Sarah K., Ellis Daniel, Crawford Katharine H.D., Dingens Adam S., Navarro Mary Jane, Bowen John E., Tortorici M. Alejandra, Walls Alexandra C., King Neil P., Veesler David, and Bloom Jesse D.. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell, 182(5):1295–1310.e20, September 2020. ISSN 00928674. doi: 10.1016/j.cell.2020.08.012. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867420310035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  186. Steinegger Martin and Söding Johannes. Clustering huge protein sequence sets in linear time. Nature Communications, 9(1):2542, Jun 2018. ISSN 2041–1723. doi: 10.1038/s41467-018-04964-5. URL 10.1038/s41467-018-04964-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  187. Stiffler Michael A., Hekstra Doeke R., and Ranganathan Rama. Evolvability as a Function of Purifying Selection in TEM-1 beta-Lactamase. Cell, 160(5):882–892, February 2015. ISSN 00928674. doi: 10.1016/j.cell.2015.01.035. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867415000781. [DOI] [PubMed] [Google Scholar]
  188. Stourac Jan, Juraj Dúbrava Miloš Musil, Horácková Jana, Damborský Jiří, Mazurenko S., and Bednář David. FireProtDB: database of manually curated protein stability data. Nucleic Acids Research, 49:D319 – D324, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  189. Suiter Chase C., Moriyama Takaya, Matreyek Kenneth A., Yang Wentao, Emma Rose Scaletti Rina Nishii, Yang Wenjian, Hoshitsuki Keito, Singh Minu, Trehan Amita, Parish Chris, Smith Colton, Li Lie, Bhojwani Deepa, Yuen Liz Y. P., Li Chi-kong, Li Chak-ho, Yang Yung-li, Walker Gareth J., Goodhand James R., Kennedy Nicholas A., Klussmann Federico Antillon, Bhatia Smita, Relling Mary V., Kato Motohiro, Hori Hiroki, Bhatia Prateek, Ahmad Tariq, Yeoh Allen E. J., Stenmark Pål, Fowler Douglas M., and Yang Jun J.. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proceedings of the National Academy of Sciences, 117(10):5394–5401, March 2020. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1915680117. URL https://pnas.org/doi/full/10.1073/pnas.1915680117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  190. Sun Song, Weile Jochen, Verby Marta, Wu Yingzhou, Wang Yang, Cote Atina G., Fotiadou Iosifina, Kitaygorodsky Julia, Vidal Marc, Rine Jasper, Ješina Pavel, Kožich Viktor, and Roth Frederick P.. A proactive genotypeto-patient-phenotype map for cystathionine beta-synthase. Genome Medicine, 12(1):13, December 2020. ISSN 1756–994X. doi: 10.1186/s13073-020-0711-1. URL https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-0711-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  191. Sundaram Laksshman, Gao Hong, Padigepati Samskruthi Reddy, McRae Jeremy F, Li Yanjun, Kosmicki Jack A, Fritzilas Nondas, Hakenberg Jörg, Dutta Anindita, Shon John, et al. Predicting the clinical impact of human mutation with deep neural networks. Nature genetics, 50(8):1161–1170, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  192. Suphatrakul Amporn, Posiri Pratsaneeyaporn, Srisuk Nittaya, Nantachokchawapan Rapirat, On-nome Suppachoke, Mongkolsapaya Juthathip, and Siridechadilok Bunpote. Functional analysis of flavivirus replicase by deep mutational scanning of dengue NS5. March 2023. [Google Scholar]
  193. Suzek Baris E., Wang Yuqi, Huang Hongzhan, McGarvey Peter B., and Wu Cathy H.. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 31:926 – 932, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  194. Suzek Baris E, Wang Yuqi, Huang Hongzhan, McGarvey Peter B, Wu Cathy H, and UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 31(6):926–932, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  195. Tan Timothy J C, Mou Zongjun, Lei Ruipeng, Ouyang Wenhao O, Yuan Meng, Song Ge, Andrabi Raiees, Wilson Ian A, Kieffer Collin, Dai Xinghong, Matreyek Kenneth A, and Wu Nicholas C. High-throughput identification of prefusion-stabilizing mutations in SARS-CoV-2 spike. Nature Communications, 14(1):2003, April 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  196. Thompson Samuel, Zhang Yang, Ingle Christine, Reynolds Kimberly A, and Kortemme Tanja. Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme. eLife, 9: e53476, July 2020. ISSN 2050–084X. doi: 10.7554/eLife.53476. URL https://elifesciences.org/articles/53476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  197. Thyagarajan Bargavi and Bloom Jesse D. The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin. Elife, 3, July 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  198. Tóth-Petróczy Agnes and Tawfik Dan S. Protein insertions and deletions enabled by neutral roaming in sequence space. Mol. Biol. Evol., 30(4):761–771, April 2013. [DOI] [PubMed] [Google Scholar]
  199. Tripathi Arti, Gupta Kritika, Khare Shruti, Jain Pankaj C., Patel Siddharth, Kumar Prasanth, Pulianmackal Ajai J., Aghera Nilesh, and Varadarajan Raghavan. Molecular Determinants of Mutant Phenotypes, Inferred from Saturation Mutagenesis Data. Molecular Biology and Evolution, 33(11):2960–2975, November 2016. ISSN 0737–4038, 1537–1719. doi: 10.1093/molbev/msw182. URL https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msw182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  200. Tsuboyama Kotaro, Dauparas Justas, Chen Jonathan, Laine Elodie, Behbahani Yasser Mohseni, Weinstein Jonathan J., Mangan Niall M., Ovchinnikov Sergey, and Rocklin Gabriel J.. Mega-scale experimental analysis of protein folding stability in biology and protein design, December 2022. URL https://www.biorxiv.org/content/10.1101/2022.12.06.519132v3. Pages: 2022.12.06.519132 Section: New Results. [DOI] [PMC free article] [PubMed] [Google Scholar]
  201. Tsuboyama Kotaro, Dauparas Justas, Chen Jonathan, Laine Elodie, Behbahani Yasser Mohseni, Weinstein Jonathan J., Mangan Niall M., Ovchinnikov Sergey, and Rocklin Gabriel J.. Mega-scale experimental analysis of protein folding stability in biology and design. Nature, 620(7973):434–444, August 2023. ISSN 0028–0836, 1476–4687. doi: 10.1038/s41586-023-06328-6. URL https://www.nature.com/articles/s41586-023-06328-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  202. UK Monogenic Diabetes Consortium, Myocardial Infarction Genetics Consortium, UK Congenital Lipodystrophy Consortium, Majithia Amit R, Tsuda Ben, Agostini Maura, Gnanapradeepan Keerthana, Rice Robert, Peloso Gina, Patel Kashyap A, Zhang Xiaolan, Broekema Marjoleine F, Patterson Nick, Duby Marc, Sharpe Ted, Kalkhoven Eric, Rosen Evan D, Barroso Inês, Ellard Sian, Kathiresan Sekar, O’Rahilly Stephen, Chatterjee Krishna, Florez Jose C, Mikkelsen Tarjei, Savage David B, and Altshuler David. Prospective functional classification of all possible missense variants in PPARG. Nature Genetics, 48(12):1570–1575, December 2016. ISSN 1061–4036, 1546–1718. doi: 10.1038/ng.3700. URL https://www.nature.com/articles/ng.3700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  203. Urbina Fabio, Lentzos Filippa, Invernizzi Cédric, and Ekins Sean. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3):189–191, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  204. Ursu Oana, Neal James T, Shea Emily, Thakore Pratiksha I, Jerby-Arnon Livnat, Nguyen Lan, Dionne Danielle, Diaz Celeste, Bauman Julia, Mosaad Mariam Mounir, Fagre Christian, Lo April, McSharry Maria, Giacomelli Andrew O, Ly Seav Huong, Rozenblatt-Rosen Orit, Hahn William C, Aguirre Andrew J, Berger Alice H, Regev Aviv, and Boehm Jesse S. Massively parallel phenotyping of coding variants in cancer with perturb-seq. Nature Biotechnology, 40(6):896–905, June 2022. [DOI] [PubMed] [Google Scholar]
  205. van Loggerenberg Warren, Sowlati-Hashjin Shahin, Weile Jochen, Hamilton Rayna, Chawla Aditya, Gebbia Marinella, Kishore Nishka, Frésard Laure, Mustajoki Sami, Pischik Elena, Pierro Elena Di, Barbaro Michela, Floderus Ylva, Schmitt Caroline, Gouya Laurent, Colavin Alexandre, Nussbaum Robert, Friesema Edith C H, Kauppinen Raili, To-Figueras Jordi, Aarsand Aasne K, Desnick Robert J, Garton Michael, and Roth Frederick P. Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation. bioRxiv, February 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  206. Vanella Rosario, Küng Christoph, Schoepfer Alexandre A, Doffini Vanni, Ren Jin, and Nash Michael A. Understanding Activity-Stability tradeoffs in biocatalysts by enzyme proximity sequencing. March 2023. [DOI] [PMC free article] [PubMed]
  207. Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. Attention is all you need, 2017.
  208. Veerapandian Veeramohan, Ackermann Jan Ole, Srivastava Yogesh, Malik Vikas, Weng Mingxi, Yang Xiaoxiao, and Jauch Ralf. Directed evolution of reprogramming factors by cell selection and sequencing. Stem Cell Reports, 11(2):593–606, August 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  209. Wan Aliete, Place Emily, Pierce Eric A., and Comander Jason. Characterizing variants of unknown significance in rhodopsin: A functional genomics approach. Human Mutation, 40(8):1127–1144, August 2019. ISSN 1059–7794, 1098–1004. doi: 10.1002/humu.23762. URL https://onlinelibrary.wiley.com/doi/10.1002/humu.23762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  210. Weeks Ryan and Ostermeier Marc. Fitness and functional landscapes of the e. coli RNase III gene rnc. Molecular Biology and Evolution, 40(3), March 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  211. Weile Jochen, Sun Song, Cote Atina G, Knapp Jennifer, Verby Marta, Mellor Joseph C, Wu Yingzhou, Pons Carles, Wong Cassandra, Lieshout Natascha, Yang Fan, Tasan Murat, Tan Guihong, Yang Shan, Fowler Douglas M, Nussbaum Robert, Bloom Jesse D, Vidal Marc, Hill David E, Aloy Patrick, and Roth Frederick P. A framework for exhaustively mapping functional missense variants. Molecular Systems Biology, 13(12):957, December 2017. ISSN 1744–4292, 1744–4292. doi: 10.15252/msb.20177908. URL https://onlinelibrary.wiley.com/doi/10.15252/msb.20177908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  212. Weile Jochen, Kishore Nishka, Sun Song, Maaieh Ranim, Verby Marta, Li Roujia, Fotiadou Iosifina, Kitaygorodsky Julia, Wu Yingzhou, Holenstein Alexander, Bürer Céline, Blomgren Linnea, Yang Shan, Nussbaum Robert, Rozen Rima, Watkins David, Gebbia Marinella, Kozich Viktor, Garton Michael, Froese D Sean, and Roth Frederick P. Shifting landscapes of human MTHFR missense-variant effects. American Journal of Human Genetics, 108(7):1283–1300, July 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  213. Weng Chenchun, Faure Andre J, and Lehner Ben. The energetic and allosteric landscape for KRAS inhibition. December 2022. [DOI] [PMC free article] [PubMed]
  214. Wrenbeck Emily E., Azouz Laura R., and Whitehead Timothy A.. Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded. Nature Communications, 8(1):15695, August 2017. ISSN 2041–1723. doi: 10.1038/ncomms15695. URL http://www.nature.com/articles/ncomms15695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  215. Wrenbeck Emily E, Bedewitz Matthew A, Klesmith Justin R, Noshin Syeda, Barry Cornelius S, and Whitehead Timothy A. An automated Data-Driven pipeline for improving heterologous enzyme expression. ACS Synthetic Biology, 8(3):474–481, March 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  216. Wu Nicholas C., Young Arthur P., Al-Mawsawi Laith Q., Olson C. Anders, Feng Jun, Qi Hangfei, Chen Shu-Hwa, Lu I.-Hsuan, Lin Chung-Yen, Chin Robert G., Luan Harding H., Nguyen Nguyen, Nelson Stanley F., Li Xinmin, Wu Ting-Ting, and Sun Ren. High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution. Scientific Reports, 4(1):4942, December 2014. ISSN 2045–2322. doi: 10.1038/srep04942. URL https://www.nature.com/articles/srep04942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  217. Wu Nicholas C., Olson C. Anders, Du Yushen, Le Shuai, Tran Kevin, Remenyi Roland, Gong Danyang, Al-Mawsawi Laith Q., Qi Hangfei, Wu Ting-Ting, and Sun Ren. Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality. PLOS Genetics, 11(7):e1005310, July 2015. ISSN 1553–7404. doi: 10.1371/journal.pgen.1005310. URL https://dx.plos.org/10.1371/journal.pgen.1005310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  218. Wu Nicholas C, Dai Lei, Olson C Anders, Lloyd-Smith James O, and Sun Ren. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife, 5:e16965, July 2016. ISSN 2050–084X. doi: 10.7554/eLife.16965. URL https://elifesciences.org/articles/16965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  219. Wu Yingzhou, Liu Hanqing, Li Roujia, Sun Song, Weile Jochen, and Roth Frederick P. Improved pathogenicity prediction for rare human missense variants. The American Journal of Human Genetics, 108(10):1891–1906, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  220. Wu Zachary, Kan S. B. Jennifer, Lewis Russell D., Wittmann Bruce J., and Arnold Frances H.. Machine learning-assisted directed protein evolution with combinatorial libraries. Proceedings of the National Academy of Sciences, 116:8852 – 8858, 2019. URL https://api.semanticscholar.org/CorpusID:67770057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  221. Xie Michael J, Cromie Gareth A, Owens Katherine, Timour Martin S, Tang Michelle, Kutz J Nathan, El-Hattab Ayman W, McLaughlin Richard N, and Dudley Aimée M. Predicting the functional effect of compound heterozygous genotypes from large scale variant effect maps. bioRxiv, January 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  222. Xu Minghao, Zhang Zuobai, Lu Jiarui, Zhu Zhaocheng, Zhang Yangtian, Ma Chang, Liu Runcheng, and Tang Jian. PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding, September 2022. URL http://arxiv.org/abs/2206.02096. arXiv:2206.02096 [cs].
  223. Yang Kevin Kaichuang, Wu Zachary, and Arnold Frances H.. Machine-learning-guided directed evolution for protein engineering. Nature Methods, pages 1–8, 2018. URL https://api.semanticscholar.org/CorpusID:128342395. [DOI] [PubMed] [Google Scholar]
  224. Yang Kevin Kaichuang, Lu Alex X., and Fusi Nicoló. Convolutions are competitive with transformers for protein sequence pretraining. bioRxiv, 2023a. URL https://api.semanticscholar.org/CorpusID:248990392. [DOI] [PubMed] [Google Scholar]
  225. Yang Kevin Kaichuang, Zanichelli Niccoló, and Yeh Hugh. Masked inverse folding with sequence transfer for protein representation learning. bioRxiv, 2023b. URL https://api.semanticscholar.org/CorpusID:249241961. [DOI] [PubMed] [Google Scholar]
  226. Yee Sook Wah, Macdonald Christian, Mitrovic Darko, Zhou Xujia, Koleske Megan L, Yang Jia, Silva Dina Buitrago, Grimes Patrick Rockefeller, Trinidad Donovan, More Swati S, Kachuri Linda, Witte John S, Delemotte Lucie, Giacomini Kathleen M, and Coyote-Maestas Willow. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. bioRxiv, June 2023. [Google Scholar]
  227. Young Heather J., Chan Matthew, Selvam Balaji, Szymanski Steven K., Shukla Diwakar, and Procko Erik. Deep Mutagenesis of a Transporter for Uptake of a Non-Native Substrate Identifies Conformationally Dynamic Regions. preprint, Biochemistry, April 2021. URL http://biorxiv.org/lookup/doi/10.1101/2021.04.19.440442. [Google Scholar]
  228. Zhang Haicang, Xu Michelle S, Fan Xiao, Chung Wendy K, and Shen Yufeng. Predicting functional effect of missense variants using graph attention neural networks. Nature Machine Intelligence, 4(11):1017–1028, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  229. Zhou Naihui, Jiang Yuxiang, Bergquist Timothy, Lee Alexandra J., Kacsoh Balint Z., Crocker Alex, Lewis Kimberley A., Georghiou George E., Nguyen Huy N., Hamid Nafiz Imtiaz Bin, Davis Larry, Dogan Tunca, Atalay Volkan, Rifaioglu Ahmet Sureyya, Dalkiran Alperen, Cetin-Atalay Rengul, Zhang Chengxin, Hurto Rebecca L., Freddolino Peter L., Zhang Yang, Bhat Prajwal, Supek Fran, Fernández José María, Gemović Branislava, Perovic Vladimir, Davidovic Radoslav, Sumonja Neven, Veljkovic Nevena, Asgari Ehsaneddin, Mofrad Mohammad R. K., Profiti Giuseppe, Savojardo Castrense, Martelli Pier Luigi, Casadio Rita, Boecker Florian, Kahanda Indika, Thurlby Natalie, Mchardy Alice, Renaux Alexandre, Saidi Rabie, Gough Julian, Freitas Alex Alves, Antczak Magdalena, Fabris Fábio, Wass Mark N., Hou Jie, Cheng Jianlin, Wang Zheng, Romero Alfonso E., Paccanaro Alberto, Yang Haixuan, Goldberg Tatyana, Zhao Chenguang, Holm Liisa, Törönen Petri, Medlar Alan, Zosa Elaine, Borukhov Itamar, Novikov Ilya B., Wilkins Angela D., Lichtarge Olivier, Chi Po-Han, Tseng Wei-Cheng, Linial Michal, Rose Peter W., Dessimoz Christophe, Vidulin Vedrana, Džeroski Sašo, Sillitoe Ian P. W., Das Sayoni, Lees Jonathan G., Jones David T., Wan Cen, Cozzetto Domenico, Fa Rui, Torres Mateo, Vesztrocy Alex Warwick, Rodriguez Jose Manuel, Tress Michael L., Frasca Marco, Notaro Marco, Grossi Giuliano, Petrini Alessandro, Ré Matteo, Valentini Giorgio, Mesiti Marco, Roche Daniel B., Reeb Jonas, Ritchie David W., Aridhi Sabeur, Alborzi Seyed Ziaeddin, Devignes Marie-Dominique, Koo Da Chen Emily, Bonneau Richard, Gligorijević Vladimir, Barot Meet, Fang Hai, Toppo Stefano, Lavezzo Enrico, Falda Marco, Berselli Michele, Tosatto Silvio C. E., Carraro Marco, Piovesan Damiano, Rehman Hafeez ur, Mao Qizhong, Zhang Shanshan, Vucetic Slobodan, Black Gage S, Jo Dane, Larsen Dallas J., Omdahl Ashton, Sagers Luke, Suh Erica, Dayton Jonathan B., McGuffin Liam James, Brackenridge Danielle Allison, Babbitt Patricia C., Yunes Jeffrey M., Fontana Paolo, Zhang Feng, Zhu Shanfeng, You Ronghui, Zhang Zihan, Dai Suyang, Yao Shuwei, Tian Weidong, Cao Renzhi, Chandler Caleb, Amezola Miguel, Johnson Devon, Chang Jia-Ming, Liao Wen-Hung, Liu Yi-Wei, Pascarelli Stefano, Frank Yotam, Hoehndorf R., Kulmanov Maxat, Boudellioua Imane, Politano Gianfranco, Carlo Stefano Di, Benso Alfredo, Hakala Kai, Ginter Filip, Mehryary Farrokh, Kaewphan Suwisa, Björne Jari, Moen Hans, Tolvanen Martti, Salakoski Tapio, Kihara Daisuke, Jain Aashish, Šmuc Tomislav, Altenhoff Adrian M., Ben-Hur Asa, Rost Burkhard, Brenner Steven E., Orengo Christine A., Jeffery Constance J., Bosco Giovanni, Hogan Deborah A., Martin Maria Jesus, O’Donovan Claire, Mooney Sean D., Greene Casey S., Radivojac Predrag, and Friedberg Iddo. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology, 20, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  230. Zinkus-Boltz Julia, DeValk Craig, and Dickinson Bryan C.. A Phage-Assisted Continuous Selection Approach for Deep Mutational Scanning of Protein–Protein Interactions. ACS Chemical Biology, 14(12):2757–2767, December 2019. ISSN 1554–8929, 1554–8937. doi: 10.1021/acschembio.9b00669. URL https://pubs.acs.org/doi/10.1021/acschembio.9b00669. [DOI] [PubMed] [Google Scholar]

Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES