Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Jun 8;16(6):e1007447. doi: 10.1371/journal.pcbi.1007447

Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface

Anna U Lowegard 1,2,#, Marcel S Frenkel 3,#, Graham T Holt 1,2, Jonathan D Jou 2, Adegoke A Ojewole 1,2, Bruce R Donald 2,3,*
Editor: Roland L Dunbrack Jr4
PMCID: PMC7329130  PMID: 32511232

Abstract

The K* algorithm provably approximates partition functions for a set of states (e.g., protein, ligand, and protein-ligand complex) to a user-specified accuracy ε. Often, reaching an ε-approximation for a particular set of partition functions takes a prohibitive amount of time and space. To alleviate some of this cost, we introduce two new algorithms into the osprey suite for protein design: fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. fries pre-processes the sequence space to limit a design to only the most stable, energetically favorable sequence possibilities. EWAK* then takes this pruned sequence space as input and, using a user-specified energy window, calculates K* scores using the lowest energy conformations. We expect fries/EWAK* to be most useful in cases where there are many unstable sequences in the design sequence space and when users are satisfied with enumerating the low-energy ensemble of conformations. In combination, these algorithms provably retain calculational accuracy while limiting the input sequence space and the conformations included in each partition function calculation to only the most energetically favorable, effectively reducing runtime while still enriching for desirable sequences. This combined approach led to significant speed-ups compared to the previous state-of-the-art multi-sequence algorithm, BBK*, while maintaining its efficiency and accuracy, which we show across 40 different protein systems and a total of 2,826 protein design problems. Additionally, as a proof of concept, we used these new algorithms to redesign the protein-protein interface (PPI) of the c-Raf-RBD:KRas complex. The Ras-binding domain of the protein kinase c-Raf (c-Raf-RBD) is the tightest known binder of KRas, a protein implicated in difficult-to-treat cancers. fries/EWAK* accurately retrospectively predicted the effect of 41 different sets of mutations in the PPI of the c-Raf-RBD:KRas complex. Notably, these mutations include mutations whose effect had previously been incorrectly predicted using other computational methods. Next, we used fries/EWAK* for prospective design and discovered a novel point mutation that improves binding of c-Raf-RBD to KRas in its active, GTP-bound state (KRasGTP). We combined this new mutation with two previously reported mutations (which were highly-ranked by osprey) to create a new variant of c-Raf-RBD, c-Raf-RBD(RKY). fries/EWAK* in osprey computationally predicted that this new variant binds even more tightly than the previous best-binding variant, c-Raf-RBD(RK). We measured the binding affinity of c-Raf-RBD(RKY) using a bio-layer interferometry (BLI) assay, and found that this new variant exhibits single-digit nanomolar affinity for KRasGTP, confirming the computational predictions made with fries/EWAK*. This new variant binds roughly five times more tightly than the previous best known binder and roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD). This study steps through the advancement and development of computational protein design by presenting theory, new algorithms, accurate retrospective designs, new prospective designs, and biochemical validation.

Author summary

Computational structure-based protein design is an innovative tool for redesigning proteins to introduce a particular or novel function. One such function is improving the binding of one protein to another, which can increase our understanding of important protein systems. Herein we introduce two novel, provable algorithms, fries and EWAK*, for more efficient computational structure-based protein design as well as their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. These new algorithms speed-up computational structure-based protein design while maintaining accurate calculations, allowing for larger, previously infeasible protein designs. Additionally, using fries and EWAK* within the osprey suite, we designed the tightest known binder of KRas, a heavily studied cancer target that interacts with a number of different proteins. This previously undiscovered variant of a KRas-binding domain, c-Raf-RBD, has potential to serve as a tool to further probe the protein-protein interface of KRas with its effectors and its discovery alone emphasizes the potential for more successful applications of computational structure-based protein design.

Introduction

Computational structure-based protein design (CSPD) is an innovative tool that enables the prediction of protein sequences with desired biochemical properties (such as improved binding affinity). osprey (Open Source Protein Redesign for You) [1] is an open-source, state-of-the-art software package used for CSPD and is available at http://www.cs.duke.edu/donaldlab/osprey.php for free. osprey’s algorithms focus on provably returning the optimal sequences and conformations for a given input model. In contrast, as argued in [27], stochastic, non-deterministic approaches [810] provide no guarantees on the quality of conformations, or sequences, and make determining sources of error in predicted designs very difficult.

When using osprey, the input model generally consists of a protein structure, a flexibility model (e.g., choice of sidechain or backbone flexibility, allowed mutable residues, etc.), and an all-atom pairwise-decomposable energy function that is used to evaluate conformations. osprey models amino acid sidechains using frequently observed rotational isomers or “rotamers” [11]. Additionally, osprey can also model continuous sidechain flexibility [1215] along with discrete and continuous backbone flexibility [1619], which allow for a more accurate approximation of protein behavior [13, 16, 2023]. The output produced by CSPD generally consists of a set of candidate sequences and conformations. Many protein design methods have focused on computing a global minimum energy conformation (GMEC) [14, 18, 2428]. However, a protein in solution exists not as a single, low-energy structure, but as a thermodynamic ensemble of conformations. Models that only consider the GMEC may incorrectly predict biophysical properties such as binding [12, 2023, 2931] because GMEC-based algorithms underestimate potentially significant entropic contributions. In contrast to GMEC-based approaches, the K* algorithm [12, 29, 30] in osprey provably approximates the Boltzmann-weighted partition function for a protein state, thereby modeling the thermodynamic ensemble. When designing for binding affinity, this enables the designer to calculate the K* score—a ratio of the Boltzmann-weighted partition functions for a protein-ligand complex that estimates the association constant, Ka (further detailed in the Section entitled “Computational materials and methods”). BBK* [32] is an efficient, multi-sequence design algorithm that calls the K* algorithm as a subroutine. Previous algorithms [12, 27, 29, 30, 3335] that design for binding affinity using ensembles are linear in the size of the sequence space N, where N is exponential in the number of simultaneously mutable residue positions. BBK* is the first provable ensemble-based algorithm to run in time sublinear in N, making it possible not only to perform K* designs over large sequence spaces, but also to enumerate a gap-free list of sequences in order of decreasing K* score.

osprey has been used successfully on several empirical, prospective designs including designing enzymes [12, 16, 22, 29, 36], resistance mutations [2, 37, 38], protein-protein interaction inhibitors [30, 39], epitope-specific antibody probes [40], and broadly-neutralizing antibodies [41, 42]. These successes have been validated experimentally in vitro and in vivo and are now being tested in several clinical trials [4345]. However, while osprey has been successful in the past, as the size of protein design problems grows (e.g., when considering a large protein-protein interface), enumerating and minimizing the necessary number of conformations and sequences to satisfy the provable halting criteria in previous K*-based algorithms [12, 29, 30] becomes prohibitive (despite recent algorithmic improvements [32]). The entire conformation space can be monumental in size and heavily populated with energetically unfavorable sequences and conformations. EWAK*, an Energy Window Approximation to K*, seeks to alleviate some of this difficulty by restricting the conformations included in each sequence’s thermodynamic ensemble. EWAK*guarantees that each conformational ensemble contains all of the lowest energy conformations within an energy window of the GMEC for each design sequence. fries, a Fast Removal of Inadequately Energied Sequences, also mitigates this complexity problem by limiting the input sequence space to only the most favorable, low energy sequences. Previous algorithms have focused on optimizing for sequences whose conformations are similar in energy to that of the GMEC. In contrast, fries focuses on optimizing for sequences with energies better-than or comparable-to the wild-type sequence. fries guarantees that the restricted input sequence space includes all of the sequences within an energy window of the wild-type sequence, but excludes any potentially unstable sequences with significantly worse partition function values. Wild-type sequences are generally expected to be near-optimal for their corresponding folds [46]. Therefore, limiting the sequence space to sequences energetically similar to or better than the wild-type sequence is reasonable.

We compare BBK* with K* (henceforth referred to as BBK*) to BBK* with EWAK* and fries (henceforth referred to as EWAK* and fries) to test our new methods. The implementation details of these algorithms involve some technical distinctions, which are discussed in S4 Text. Compared to the previous state-of-the-art algorithm BBK*, fries and EWAK* improve runtimes by up to 2 orders of magnitude, fries decreases the size of the sequence space by up to 2 orders of magnitude, and EWAK* decreases the number of conformations included in partition function calculations by up to almost 2 orders of magnitude. These improvements are shown across 2,826 protein design problems spanning 40 different protein systems (see the Section entitled “Computational experiments” for more details).

As a proof of concept to further test these algorithms and our design approach, we used fries and EWAK* to study the protein-protein interface (PPI) of KRasGTP in complex with its tightest-binding effector, c-Raf. As described in the Section entitled “Computational redesign of the c-Raf-RBD:KRas protein-protein interface,” KRas is an important cancer target that has been heavily studied and exhibits a thoroughly optimized protein-protein interface in its interactions with its effectors [4759]. For this study, we focused on the re-design of the c-Raf Ras-binding domain (c-Raf-RBD), the tightest known naturally-occurring binding partner of KRas, in complex with KRasGTP (c-Raf-RBD:KRasGTP). First, our new algorithms successfully retrospectively predicted the effect on binding of mutations in the c-Raf-RBD:KRasGTP PPI even where other computational methods previously failed [60]. Next, we used fries/EWAK* prospectively to predict the effect of novel, previously unreported mutations in the PPI of the c-Raf-RBD:KRasGTP complex. We then screened the top osprey-predicted c-Raf-RBD variants using a bio-layer interferometry (BLI) assay single-concentration screen. Looking at the dissociation rates, this screen suggested that one of our new computationally-predicted c-Raf-RBD variants—c-Raf-RBD(Y), a c-Raf-RBD that includes the mutation V88Y—exhibits improved binding to KRasGTP. Next, we created a c-Raf-RBD variant, c-Raf-RBD(RKY), that included this new mutation, V88Y, together with two previously reported mutations [60], N71R and A85K. fries/EWAK* computationally predicted that c-Raf-RBD(RKY) would bind more tightly to KRasGTP than any other variant. The single-concentration screen using BLI also suggested that c-Raf-RBD(RKY) binds more tightly to KRasGTP than the previously reported best variant [60]. The Kd values for the most promising variants were measured using a BLI assay with titration which confirmed our computational predictions and that, to the best of our knowledge, the novel construct c-Raf-RBD(RKY) is the highest affinity variant ever designed, with single-digit nanomolar affinity for KRasGTP and binding roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD).

Computational materials and methods

The K* algorithm’s [12, 29, 30] K* score serves as an estimate of the binding constant, Ka, and is calculated by first approximating the Boltzmann-weighted partition function of each state: unbound protein (P), unbound ligand (L), and the bound protein-ligand complex (C). Each Boltzmann-weighted partition function Zx(s), x ∈ {P, L, C}, is defined as:

Zx(s)=dQ(s)exp(Ex(d)/RT). (1)

If s is any—generally amino acid—sequence of n residues, then Q(s) is the set of conformations defined by s, Ex(d) is the minimized energy of a conformation d in state x, and R and T are the gas constant and temperature, respectively. Many protein design algorithms approximate these partition functions for each state using either stochastic [6164] or provable [2, 12, 2931, 33, 64] methods.

osprey’s K* algorithm approximates these partition functions to within a user-specified ε of the full partition function as defined in Eq (1) where C, P, and L refer to the protein-ligand complex, the unbound protein, and the unbound ligand, respectively. The binding affinity for sequence s is defined as:

Ka(s)=ZC(s)ZP(s)ZL(s). (2)

The K* algorithm provably approximates this binding affinity. This is enabled by the use of A* [4, 12, 26, 65], which allows for the gap-free enumeration of conformations in order of increasing lower bounds on energy [26]. However, enumerating a sufficient number of these conformations to obtain a guaranteed ε-approximation can be very time consuming because the set of all conformations Q(s) grows exponentially with the number of residues n. Also, the K* algorithm was originally [12, 29, 30] limited to computing a K* score for every sequence in the sequence space as defined by the input model for a particular design. However, BBK* [32] builds on K* and returns the top m sequences along with their ε-approximate K* scores and runs in time sublinear in the number of sequences. That is, BBK* does not require calculating ε-approximate K* scores for (or even examining) every sequence in the sequence space before it returns the top sequences. Nevertheless, BBK* may spend unnecessary time and resources evaluating unfavorable sequences before deciding to prune them. These previous methods, while efficient, suffer from two practical drawbacks. First, some returned sequences exhibit a large K* score (i.e. are predicted to improve binding) due to a decrease in stability of the unbound states. These sequences are rarely desirable in practice, since decreasing protein stability can result in poor folding and aggregation. Second, the approximation error for some sequences is slow to approach epsilon which can lead to prohibitively slow designs.

To overcome the above limitations of BBK* and K*, we introduce fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. These two algorithms limit the input sequence space and the number of conformations included in each partition function estimate when approximating a sequence’s K* score to only the most energetically favorable options (see Fig 1). The fries/EWAK* approach limits the number of conformations that must be enumerated (see the Section entitled “fries limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores”), which leads to significant speed-ups (see the Section entitled “fries/EWAK* is up to 2 orders of magnitude faster than BBK*”) because each enumerated conformation must undergo an energy minimization step. This minimization step is relatively expensive, therefore, anything that reduces the number of minimized conformations while not sacrificing provable accuracy is desirable. For the importance of this minimization step to biological accuracy, see the discussions of continuous flexibility and its comparison to discrete flexibility in [4, 5, 7, 13, 14, 19]. EWAK* also maintains the advances made by BBK* including running in time sublinear in the number of sequences N and returning sequences in order of decreasing K* score. fries and EWAK* are described in further detail in the Section entitled “Algorithms” below.

Fig 1. Design example using the structure of the LecB lectin Pseudomonas aeruginosa strain PA14 (PDB ID: 5A6Y [67]) and the osprey workflow for fries/EWAK*.

Fig 1

In the top panel, the full, 4 domain structure of lectin is shown on the left-hand side. (A) Zooming in on the region where domains A (green) and D (yellow) interact, showing the two mutable residues (Q80 and I82) along with the surrounding flexible shell of residues as lines. There were 11 flexible residues included in this design with Q80 and I82 allowed to mutate to all other amino acids except for proline. This design consisted of 8.102 × 1011 conformations and 441 sequences. fries limited this space to 5.704 × 1011 conformations and 206 sequences. fries/EWAK* in combination reduced the amount of time taken by about 75% compared to BBK*. fries alone was responsible for roughly 50% of this speed-up. (B) 10 low-energy conformations included in the thermodynamic ensemble of the design sequence with mutations Q80I and I82F. For this particular sequence, BBK* minimized 10,664 conformations while EWAK* minimized only 4,104 conformations. The bottom panel shows the general workflow for fries/EWAK*. The workflow begins with the input model (as described in the Section entitled “Computational materials and methods”), which defines the design space for the first algorithm, fries. fries proceeds to prune the sequence space as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)” and as illustrated in the Venn diagram with the unpruned space shown as a yellow disk. Next, the remaining fries sequence space defines the conformation space (which contains multiple sequences as well as conformations) searched with EWAK*. EWAK* limits the conformations included in each partition function as described in the Section entitled “Energy Window Approximation to K* (EWAK*).” EWAK* generally searches over only a subset of the conformations (green area) that previous K*-based algorithms like BBK* [32] search (orange area). EWAK* then returns the top sequences based on decreasing K* score.

Algorithms

Fast removal of inadequately energied sequences (fries)

Generally in protein design when optimizing a protein-protein interface (PPI) for affinity, the designer aims to improve the K* score of a variant sequence relative to the wild-type sequence, and, when performing a design targeting a similar fold, to minimally perturb the native structure. To accomplish this, fries guarantees to only keep sequences whose partition function values are not markedly worse than the wild-type sequence’s partition function values for all of the design states (e.g. protein, ligand, and complex). How many orders of magnitude worse a particular sequence’s partition function values are allowed to be is determined by a user-specified value m. The fries algorithm prunes sequences that exhibit massive decreases in partition function values that signal an increased risk of disturbing the native structure of the states in a given system. However, sequences with markedly worse, lower partition function values may be required when searching for, for example, resistance mutations, where positive and negative design are necessary [2, 37, 38]. Importantly, fries does still allow for sequences that may have lower, worse partition function values by allowing the user to specify how many orders of magnitude lower a candidate sequence’s partition function is allowed to be relative to the wild-type sequence’s partition function.

The following algorithm is applied to each of the three states (protein, ligand, and protein-ligand complex) independently. The resulting, filtered sequence space is determined by taking the intersection of the output from the algorithm for the three states. To prune the input sequence space, fries exploits A* over a multi-sequence tree (as is described and used in comets [66]), which enjoys a fast sequence enumeration in order of lower bound on minimized energy. Each sequence v in this multi-sequence tree [66] has a corresponding single-sequence conformation tree, viz., a tree that can be searched for the lowest energy conformations for a sequence v. fries first enumerates sequences (in order of energy lower bounds) in the multi-sequence tree until the wild-type sequence is found. Then, fries searches the wild-type’s corresponding single-sequence conformation tree using A*. The first conformation enumerated according to monotonic lower bound on pairwise minimized energy is then subjected to a full-atom minimization [30] to calculate the minimized energy of one of the wild-type sequence’s conformations EWT. It is worth noting that fries only descends into and searches the single-sequence conformation tree for the wild-type sequence in order to calculate the provable halting criteria for Eq 3. fries then continues enumerating sequences in the multi-sequence tree in order of increasing lower bound on minimized energy (as described in more detail in [66]) until the lower bound on the optimal conformational energy for a sequence v, Ev, is greater than EWT + w where EWT is as described above and w is a user-specified energy window value (see Fig 2). Any variant sequence v with a lower bound on minimized energy Ev not satisfying the following criterion is pruned:

EvEWT+w. (3)
Fig 2. How fries chooses which sequences to keep and which sequences to prune.

Fig 2

The solid curve represents the energy landscape of the conformation space that spans across, in this example, 7 different sequences (separated by dotted lines). Each sequence is labeled on the x-axis with an index indicating the order with which it is (or would be) enumerated with fries in order of increasing lower bound on minimized energy (red dotted curve). fries continues to enumerate in this way until it encounters the wild-type sequence (green), at which point fries calculates the minimized energy EWT of the conformation with the lowest lower bound on minimized energy for the wild-type sequence (marked with a green dot). EWT then becomes the baseline from which fries can provably enumerate all remaining sequences within some user-specified energy window w (yellow lines). Finally, fries prunes the sequences with energies provably higher than EWT + w (black) and keeps the sequences that occur within the shaded yellow region (colored in blue and green). More sequences are also pruned according to their partition function values as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)” and Eq (4).

This criterion guarantees that the remaining, unpruned sequence space includes all sequences within an energy window of the wild-type sequence’s energy. fries enumerates sequences in order of increasing lower bound on minimized energy. Therefore, it calculates an upper bound qv on the partition function for each sequence v by Boltzmann-weighting the lower bound on its energy Ev and multiplying it by the size of the conformation space for that particular sequence |Q(v)|:

qv=|Q(v)|exp(Ev/RT). (4)

The lower bound for the wild-type sequence qWT is calculated by Boltzmann-weighting the minimized energy of the single conformation found during the sequence search for the wild-type sequence EWT:

qWT=exp(EWT/RT). (5)

qWT is a lower bound because, in the worst case, at least this one conformation will contribute to the partition function for the wild-type sequence. fries then uses these bounds to remove all of the sequences whose partition function value is not within some user-specified m orders of magnitude of the lower bound on the wild-type partition function qWT. If the following criterion is not met, the sequence v is pruned from the space:

lnqvlnqWT+m. (6)

fries prunes sequences for the protein, the ligand, and the protein-ligand complex independently, limiting the input sequence space to exclude unfavorable sequences for all of the states. The resulting smaller sequence space is subsequently used as input for EWAK*. The set of sequences remaining is guaranteed to include all of the sequences within a user-specified energy window w of the wild-type sequence that also satisfy the partition function criterion given in Eq (4). Importantly, fries can be used to limit the size of the input sequence space in this fashion for any of the protein design algorithms available within osprey.

Energy window approximation to K* (EWAK*)

After reducing the size of the input sequence space using fries, as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries),” EWAK* proceeds by using a variation on an existing algorithm: BBK* (described in [32]). The crucial difference between BBK* and EWAK* is that with EWAK* the ensemble of conformations used to approximate each K* score is limited to those within a user-specified energy window of the GMEC for each sequence. This guarantees to populate the partition function for a particular sequence and state with all of the lowest, most-favorable conformations (that fall within the user-specified energy window). Limiting the partition functions to only these energetically favorable conformations can effectively reduce runtime while still enriching for desirable sequences. These conformations often account for the majority of the full ε-approximate partition function (see the Section entitled “Computational materials and methods”) in traditional K* calculations [12]. Hence, EWAK* also empirically enjoys negligible loss in accuracy of K* scores (see the Sections labeled “EWAK* limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores” and “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas”). EWAK* retains the beneficial aspects of BBK*, including returning sequences in order of decreasing predicted binding affinity and running in time sublinear in the number of sequences. For a discussion of the relationship between ε and the energy window w, the interested reader is invited to refer to the SI.

Computational experiments

We implemented fries/EWAK* in the osprey suite of open source protein design algorithms [1]. fries was tested on 2,662 designs that range from an input sequence space size of 441 to 10,164 total sequences. The size of the reduced input sequence space produced by fries was compared to the size of the full input sequence space size for each design. For these tests, fries returned every sequence within 8 kcal/mol of the wild-type sequence and was set to include only those sequences that are at most 2 orders of magnitude worse in partition function value than the wild-type. The results for these tests are described in the Section entitled “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences.” Computational experiments were also run comparing fries/EWAK* with the previous state-of-the-art algorithm in osprey: BBK* [32]. Using BBK* and fries/EWAK*, we computed the top 5 best binding sequences for 167 different designs to compare the running time of BBK* vs. fries/EWAK*. fries was limited to sequences within 4 kcal/mol of the wild-type sequence that are at most 2 orders of magnitude worse in partition function values than the wild-type. The EWAK* partition function approximations were limited to conformations within an energy window of 1 kcal/mol of the GMEC for each sequence. BBK* was set to return the top 5 sequences with an accuracy of ε = 0.68 (as was described in [32]). Using these same EWAK* and BBK* parameters, we also compared the change in the size of the conformation space necessary to compute an accurate K* score for BBK* vs. EWAK* for 661 partition functions from 161 design examples. The results for these tests are described in the Sections labeled “fries/EWAK* is up to 2 orders of magnitude faster than BBK*” and “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences”. The number of conformations that undergo minimization (as described in [1215]) for each partition function calculation with EWAK* was also compared across different energy window sizes for 350 partition function calculations from 87 design examples. These partition function calculations were compared to BBK*’s partition function calculations with a demanded accuracy of ε = 0.10. This smaller ε allowed for more accurate approximations of the K* scores. The results for these tests are described in the Section entitled “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences”.

Every design included a set of mutable residues along with a set of surrounding flexible residues (see Fig 1 for an example). All of these residues were allowed to be continuously flexible [1215]. The designs were selected from 40 different protein structures (listed in S1 Table and also used in [32, 68]), and were run on 40-48 core Intel Xeon nodes with up to 200 GB of memory.

Computational results

fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences

The number of remaining sequences after fries was compared to the size of the complete input sequence space. In the best case, when using fries, the sequence space was decreased by more than 2 orders of magnitude and the conformation space was decreased by just over 4 orders of magnitude. The sequence space was reduced an average of 49% and the conformation space was reduced an average of 40%. These results are broken down further in Fig 3.

Fig 3. Reduction in input sequence space size using fries.

Fig 3

(A) A pie chart representing the reduction in the sequence space in percentages across all 2,662 designs. 7% of the designs had a reduction in sequence space over 95%, 24% of the designs had a reduction in sequence space between 66-95%, 31% of the designs had a reduction in sequence space between 36-65%, 32% of the designs had a reduction in sequence space between 6-35%, and 6% of the designs had a reduction in sequence space under 5%. (B) and (C) plot the number of sequences remaining after using fries starting with 441 and 9,261 sequences total, respectively. The number of sequences remaining for each design are sorted in order of decreasing size of the remaining conformation space after fries.

fries/EWAK* is up to 2 orders of magnitude faster than BBK*

The overall runtime was compared between BBK* and fries/EWAK*. fries/EWAK* was an average of 62% faster than BBK* on 167 example design problems. fries removed unfavorable sequences (as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”) from the search space for 156 out of the 167 design problems. For the cases described in the Section entitled “Computational experiments,” fries/EWAK* performed consistently faster than BBK* (in 92% of the design examples) as shown in Fig 4, Panel A. The longest running BBK* design problem took nearly 8 days, whereas fries/EWAK* completed the same example in just under 2 hours. In contrast, the design problem that took the longest for fries/EWAK* out of the 167 tested only required about 22 hours (the same design took BBK* just over 178 hours).

Fig 4. Comparing runtimes and the number of minimized conformations between fries/EWAK* and BBK* for a variety of designs.

Fig 4

(A) A plot of the runtime in seconds (the y-axis is on a log scale) for fries/EWAK* (blue dots) and BBK* (yellow dots) for 167 design examples. Each point represents one design and is plotted in increasing order of BBK* running time. fries/EWAK* was faster than BBK* 92% of the time with an average improvement of 62% over BBK* and a maximum improvement of 2.2 orders of magnitude. This improvement was evident in (A) since the blue dots (fries/EWAK* times) fall mostly below the yellow dots (BBK* times). (B) A plot of the number of conformations minimized (y-axis is on a log scale) for 661 partition function calculations from 161 design examples. The number of conformations minimized by EWAK* (blue dots) was less than the number of conformations minimized by BBK* (yellow dots) in 68% of these cases, as is evidenced by the blue dots landing mostly below the yellow dots. In the best case, EWAK* decreased the number of conformations by 1.1 orders of magnitude. The average percent reduction in the number of minimized conformations was 27%. (C) Each dot represents a calculated partition function. Yellow dots are partition functions limited to within a 1.0 kcal/mol window of the GMEC, red dots are partition functions limited to a 3.0 kcal/mol window of the GMEC, and green dots are partition functions limited to within a 5.0 kcal/mol window of the GMEC. These dots are plotted according to the number of minimized conformations required for each corresponding BBK* partition function calculation. The solid black line represents the number of BBK* minimized conformations, so dots that fall below the black line represent examples that required fewer minimized conformations than with BBK*. As they approach the 5.0 kcal/mol window, the dots begin to converge with the BBK* line. However, as the number of BBK* minimized conformations rises beyond ∼ 104, even the green dots drop below the BBK* line.

EWAK* limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores

We examined 661 K* score calculations, and concluded that the total number of conformations minimized to approximate the K* score was decreased by an average of 27%. In the best case the number of conformations minimized to approximate the K* score was decreased by 93%. These results are plotted in Fig 4, Panel B. Even though the partition function approximations were limited to a smaller conformation space with EWAK*, the K* scores did not differ by more than 0.2 orders of magnitude between EWAK* and BBK* for these 661 example K* score calculations.

A total of 350 of these 661 partition functions were subsequently re-estimated using BBK* with a more accurate, stringent ε value of 0.1 and using EWAK* with varied energy windows: 1.0 kcal/mol, 3.0 kcal/mol, and 5.0 kcal/mol. We examined the number of conformations minimized for each complex partition function calculation across the examples. When using 1.0 kcal/mol, EWAK* minimized up to 1.7 orders of magnitude fewer conformations (see Fig 4, Panel C for more details). Despite this decrease in the number of included conformations, EWAK* reported accurate K* scores. The largest difference in scores between BBK* and EWAK* was 0.3 orders of magnitude. This indicates that EWAK* retains accuracy when compared to previous provable algorithms, which have been extensively validated using experimental measurements of binding, crystal structures, and NMR structures on a variety of systems [22, 30, 3638, 4042]. The accuracy of EWAK* is explored further in the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas,” where we perform additional retrospective validation against experimental measurements.

Computational redesign of the c-Raf-RBD:KRas protein-protein interface

We previously showed, investigating 58 mutations across 4 protein systems, that osprey can accurately predict the effect of mutations on PPI binding [1]. Herein, we tested the biological accuracy of the new modules fries and EWAK* after adding them to osprey in the case of a particular system: c-Raf-RBD in complex with KRas. The c-Raf Ras-binding domain (c-Raf-RBD) is a small self-folding domain that does not include the kinase signaling domains normally present in c-Raf. The c-Raf-RBD normally binds to KRas when KRas is GTP-bound (KRasGTP). KRas has been implicated in difficult-to-treat cancers such as pancreatic ductal adenocarcinoma (PDAC) and has therefore been thoroughly studied [47, 47, 48, 48, 49, 4955, 55, 56, 5660, 69, 70]. So, to further verify the accuracy and utility of fries/EWAK*, we focused on this already heavily optimized PPI between KRasGTP and one of its many effectors, c-Raf-RBD. First, in the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas,” we retrospectively investigated previously reported mutations in the c-Raf-RBD [48, 49, 60] and how they effect the binding of c-Raf-RBD to KRas. This retrospective study lays the groundwork for the prospective study we present that investigates novel mutations. So, following the retrospective study, we computationally redesigned the PPI using fries/EWAK* in search of new c-Raf-RBD variants with improved affinity for KRasGTP (see the Section entitled “Prospective redesign of the c-Raf-RBD:KRas protein-protein interface toward improved binding” for details). To perform these computational designs, we first made a homology model of c-Raf-RBD bound to KRasGTP (see S1 Text for details).

fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas

Each previously reported c-Raf-RBD variant [48, 49, 60] was tested computationally using fries/EWAK* by calculating a K* score, a computational approximation of Ka, for each variant along with its corresponding wild-type sequence. A percent change in binding was then calculated by comparing the variant’s K* score to the corresponding wild-type sequence’s K* score. The log10 of this value was then calculated and normalized to the wild-type by subtracting 2. A similar procedure was completed using the reported experimental data in order to easily compare the computationally predicted effect with the experimentally measured effect. The resulting value, called Δb, represents the change in binding. If a variant has a Δb less than 0, it is predicted to decrease binding. If a variant has a Δb greater than 0, it is predicted to increase binding. Δb values that are roughly equivalent to 0 indicate variants that have little to no effect on binding since the wild-type sequence was normalized to 0. The Δb values for the 41 computationally tested variants were plotted and compared to experimental values in Fig 5 (a table of these values is also presented in S2 Table).

Fig 5. Predicting the effect of mutations in c-Raf-RBD on binding with KRas.

Fig 5

Each bar represents either the experimental (red) or computationally predicted (blue) effect each variant has on binding. The bars are sorted in increasing order of Δb value (see the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas”) of the experimental (red) bars. If the Δb value is less than 0, binding decreases. If the Δb value is greater than 0, binding increases. If the Δb value is close to 0, the effect is neutral. Quantitative values of K* tend to overestimate the biological effects of mutations (leading to the much larger blue bars) due to the limited nature of the input model compared to a biologically accurate representation. However, K* in general does a good job ranking variants, as can be seen here in Fig 6, in [1], and in [38]. Out of the 41 variants listed on the x-axis, only 3 were predicted incorrectly (marked with black asterisks) by EWAK*. In terms of accuracy, BBK* performed very similarly to EWAK*, however, in 2 cases (marked with green boxes), BBK* ran out of memory and was unable to calculate a score. BBK* also did not return values for the 2 variants marked with orange boxes. The variants marked with purple dots were tested in [60] experimentally—not computationally—and decreased binding of c-Raf-RBD to KRasGTP was observed, which EWAK* was able to predict correctly. The two variants marked with yellow triangles were computationally predicted in [60] to improve binding of c-Raf-RBD to KRasGTP. However, the experimental validation in [60] showed that these variants exhibit decreased binding, which EWAK* accurately predicted.

Out of the 41 variants tested (see S2 Table), EWAK* predicted the experimentally-reported effect (increased vs. decreased binding) correctly in 38 cases. The three designs where the effect was predicted incorrectly are marked with a star in Fig 5. To make these predictions, the corresponding computational designs ranged in size from single point mutations up to 6 simultaneous mutations. Results are outlined in Fig 5 and data is presented in S2 Table. The Pearson’s r of the Δb values when comparing the experimental data to the computational predictions is 0.64. Furthermore, the Spearman’s ρ value—a measure of the correlation between two sets of rankings—when comparing the experimental data to the computational predictions is 0.81. This ρ value indicates that not only can EWAK* correctly predict the effect of a particular set of mutations, but that EWAK* also does a good job ranking the variants in order according to change in binding upon mutation (see Fig 6). We emphasize Spearman’s ρ here as opposed to a Pearson’s correlation since our current designs likely underestimate entropic contributions to binding due to solvent entropy, backbone entropy, and rotating methyl groups. Nevertheless, by explicitly modeling side-chain configurational entropy, our method considers more conformational entropy than GMEC-based methods—in [1, 38] large changes in K* score corresponded to significant changes in energy, and rankings correlated well with experimental binding measurements. The Spearman’s ρ for the study presented here is comparable to the values for other PPI systems when using osprey [1, 38]. Furthermore, an accurate ranking can guide an experimental lab in choosing the rank order in which to test computational predictions [2, 12, 16, 22, 29, 30, 3642].

Fig 6. Comparing the computational EWAK* ranking with the experimental ranking for 41 c-Raf-RBD variants binding to KRas.

Fig 6

Each green dot represents a variant of c-Raf-RBD and is plotted according to the experimental ranking along with the corresponding computational ranking of its binding to KRas. A least squares fit line is shown in gray. Calculating the Pearson correlation coefficient between the two sets of rankings yields a Spearman’s ρ of 0.81.

BBK* produced similarly accurate results, but took up to 10 times longer and failed to produce results in 4 cases. In particular, in 2 cases (marked in green in Fig 5), BBK* ran out of memory. These cases in particular serve as examples of large designs where EWAK* outperforms BBK* and highlight the utility of fries/EWAK* when considering larger designs. In the 2 other cases (marked in orange in Fig 5), BBK* failed to return a result for the requested sequence in the top 5 reported sequences. This illustrated how EWAK* and fries are particularly helpful when performing these types of bigger designs that contain more simultaneous mutations and more flexible residues.

Finally, we compared our predictions to the interesting biological predictions in [60]. It is unclear how many mutants were computationally evaluated, but the authors do report computational predictions for 6 point mutations. Of those, point mutants R67L, N71R, and V88I were predicted to improve the intermolecular interactions between c-Raf-RBD and KRasGTP. However, experiments found that R67L and V88I actually reduced the binding of c-Raf-RBD to KRasGTP [48, 60]. In contrast to [60], EWAK* accurately predicted that these mutations decrease binding of c-Raf-RBD to KRasGTP. For a more detailed view of one of these designs, V88I, see Fig 7. Additionally, a number of mutations were combined and experimentally tested in [60]. Unfortunately, none of these variants improved binding to either KRasGTP or KRasGDP, which fries/EWAK* correctly predicted computationally (see Fig 5). In [60], the authors do not present any computational predictions for these combined variants, but our results show that a computational prediction using osprey’s EWAK* would have saved the time and resources taken to experimentally test these variants.

Fig 7. Redesign of c-Raf-RBD residue position 88 from valine to isoleucine.

Fig 7

The left-hand side shows c-Raf-RBD (yellow) in complex with KRas (pink). Panels (A-D) zoom in on one particular design at residue position 88 and are rotated 180°. Residue position 88 has a valine in the native, wild-type sequence (panels A & C) which was redesigned to an isoleucine (panels B & D). A mutation to isoleucine at this position was computationally predicted by EWAK* to decrease the binding of c-Raf-RBD to KRasGTP. This was experimentally validated in [60], where the authors incorrectly computationally predicted the effect of this particular mutation on the binding of c-Raf-RBD to KRasGTP. (A) The wild-type residue (valine) is shown in green with dots that indicate molecular interactions [71] with the surrounding residues (residues allowed to be flexible in the design are shown as lines). (B) The mutant residue (isoleucine) is shown in blue with dots that indicate molecular interactions [71] with the surrounding residues (residues allowed to be flexible in the design are shown as lines). Contacts made by the wild-type valine residue (circled dots in (A)) were lost upon mutation to isoleucine (circled space in (B)). (C & D) A set of 10 low-energy conformations that were included in the corresponding partition function calculation are shown for the wild-type (green) and the variant (blue).

Prospective redesign of the c-Raf-RBD:KRas protein-protein interface toward improved binding

The ability to accurately predict the effect mutations have on the binding of c-Raf-RBD to KRasGTP (see the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas”) gave us confidence in the EWAK* algorithm’s ability to predict new mutations in this interface toward a c-Raf-RBD variant that exhibits an even higher affinity for KRasGTP than previously reported variants which focused on targeting KRasGDP [60]. Therefore, to do a prospective study, we computationally redesigned 14 positions in c-Raf-RBD in the c-Raf-RBD:KRas PPI to identify promising mutations. After extending osprey to include fries and EWAK*, 14 different designs were completed where each design included 1 mutable position that was allowed to mutate to all amino acid types except for proline. Each design also included a set of surrounding flexible residues within roughly 4 Å of the mutable residue. These designs were run using fries and EWAK* and included continuous flexibility [1215]. fries was first used to limit each design to only the most favorable sequences (as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”) and then EWAK* was used to estimate the K* scores (as described in the Section entitled “Energy Window Approximation to K* (EWAK*)”). We report the upper and lower bounds on the EWAK* score for each design in Table 1 and S3 Table, where the listed sequences are those that were not pruned during the fries step. From these results, the predicted binding effect (increased vs. decreased) was determined based on comparing each variant’s K* score to its corresponding wild-type K* score. We then selected 5 novel point mutations—that to our knowledge are not reported in any existing literature—for experimental validation (see Table 1). It is worth noting that these 5 point mutations were selected out of an initial 294 possible mutations. We limited our experimental validation to only these 5 new mutations and 2 previously reported mutations. This greatly reduced the amount of resources necessary for experimental validation compared to testing all 294 possibilities. Of the mutations selected, T57M was selected to act as a variant that we computationally predicted to be comparable to wild-type. This variant was included to further verify the accuracy of osprey’s predictions. On the other hand, some of osprey’s top predictions were excluded, for instance, T57R (included in S3 Table) was not selected for experimental testing because it has an unsatisfied hydrogen bond as evidenced in the structures calculated by osprey. Another example is position V69 where 3 different mutations are predicted to improve binding, however, this position was included in our retrospective study (see the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas” and Fig 5) and was 1 of only 3 positions where osprey incorrectly predicted the effect of the mutation. Therefore, we do not believe that the scores accurately represent the effect the mutations will have in these few cases. Other excluded top predictions (see S3 Table) displayed similar characteristics or have been reported and tested previously [48, 49, 60]. One special case that is not shown in our experimental validation below is V88W which caused poor expression of c-Raf-RBD so we were unable to test it.

Table 1. Computational predictions by osprey/fries/EWAK* that were selected for experimental validation.

Each row of the table shows the results of the redesign of a residue position in c-Raf-RBD in the c-Raf-RBD:KRas PPI that were also selected for experimental validation (all of the computational results are listed in S3 Table). The table contains the values for upper and lower bounds on log(K*) values (the calculation of these bounds is described in detail in [32]). Mutations highlighted in yellow, blue, and pink were selected for experimental testing and validation. The two residues highlighted in blue are the best previously discovered [60] mutations that improve binding (independently and additively) and are included in our tightest binding variant, c-Raf-RBD(RKY) (Figs 8, 9 and 10). The variants highlighted in yellow are, to the best of our knowledge, never-before-tested variants that are predicted to increase the binding of c-Raf-RBD to KRasGTP. The variant highlighted in pink was selected for experimental testing to act as a mutation predicted to be comparable to wild-type to test how accurately osprey predicted the effects of these mutations.

Mutation Lower Bound log(K*) Upper Bound log(K*)
T57M 3.43 3.46
T57 3.82 3.92
T57K 5.01 5.07
N71 7.25 7.49
N71R 9.66 10.10
A85 26.3 26.9
A85K 30.7 32.3
K87 13.4 14.1
K87Y 14.1 14.2
V88 16.5 16.6
V88Y 17.3 17.6
V88F 18.0 18.2

Experimental validation of mutations in the c-Raf-RBD:KRas protein-protein interface

The mutations selected (highlighted in Table 1) from computational design were first screened using a bio-layer interferometry (BLI) single concentration assay (see the Section entitled “Bio-layer interferometry (BLI) dissociation rate and response screening” below). For this assay, we plotted response vs. dissociation rate constant (see Fig 9). This allowed us to quickly obtain a qualitative probe of c-Raf-RBD variant binding to KRas. It has been shown that off-rate measurements correlate to overall binding affinity [7274]. A potential pitfall of depending only on off-rate observations is the potential for a slow off-rate to be paired with a slow on-rate, resulting in lower than expected affinity. Results from this initial single-concentration BLI screen (see Fig 8) suggested that, contrary to the computational predictions, the T57K and V88F variants decrease binding, whereas the T57M and K87Y mutations both have a roughly neutral effect on binding, which is consistent with the computational predictions. The final computationally predicted point mutant, V88Y, improves binding a comparable amount to the improvement seen with A85K or N71R, two previously reported variants that improve binding as correctly predicted by osprey and also experimentally tested herein. With the discovery of this new variant containing the point mutant V88Y (referred to herein as c-Raf-RBD(Y)) the next natural step was to combine it with the mutations found in the best reported variant, N71R and A85K (referred to herein as c-Raf-RBD(RK)). Therefore, we also included the double-mutant, c-Raf-RBD(RK), and the new triple-mutant—which contains N71R, A85K, and V88Y and is referred to herein as c-Raf-RBD(RKY)—in our initial BLI screen. The c-Raf-RBD(RKY) variant was computationally predicted by fries/EWAK* to bind to KRasGTP more tightly than the previous best known binder, c-Raf-RBD(RK) (results are detailed in Fig 9). Given the promising screening and computational predictions for the c-Raf-RBD(Y) and c-Raf-RBD(RKY) variants, we measured Kd values for each variant by titrating the analyte over the ligand in a full titration BLI-based assay (see Fig 10 and the Section entitled “Bio-layer interferometry (BLI) dissociation rate and response screening” below). Titration experiments showed strong qualitative agreement with our single concentration screen. Excitingly, c-Raf-RBD(RKY) is calculated by the data from the full titration BLI assay (see Fig 10) to bind KRasGTP roughly 5 times better than the previous best known binder, c-Raf-RBD(RK), and approximately 36 times better than wild-type c-Raf-RBD, the design starting point. Given how heavily studied the KRas system is, with many reported mutational and structural studies [47, 47, 48, 48, 49, 4955, 55, 56, 5660, 69, 70], this is a surprising discovery.

Fig 9. Computational predictions in the protein-protein interface of the c-Raf-RBD:KRas complex for c-Raf-RBD(RK) and the novel variant c-Raf-RBD(RKY).

Fig 9

Shown on the left is only the relevant protein-protein interface between c-Raf-RBD and KRas. Each panel zooms in on this interface and details a different c-Raf-RBD variant and its corresponding computational predictions. The upper and lower bounds on the log(K*) score for each design variant (wild-type, c-Raf-RBD(RK), and c-Raf-RBD(RKY)) are given in the bottom table. These computational predictions correspond with and are supported by the experimental results presented in the Section entitled “Experimental validation of mutations in the c-Raf-RBD:KRas protein-protein interface.” Panels (A) and (B) show the wild-type sequence, panels (C) and (D) show the variant c-Raf-RBD(RK), and panels (E) and (F) show the novel computationally predicted variant c-Raf-RBD(RKY). Panels (A), (C), and (E) show the wild-type, c-Raf-RBD(RK), and c-Raf-RBD(RKY), respectively, along with probe dots [71] that represent the molecular interactions within each structure calculated by osprey. These probe dots were selected to only show interactions between the residues included in the computational designs (shown as green and blue lines) with their surrounding residues. Panels (B), (D), and (F) show 10 low-energy structures from each conformational ensemble calculated by osprey/EWAK*. Panel (G) shows a zoomed-in overlay of the wild-type variant with the c-Raf-RBD variant that includes only the V88Y mutation. Purple arrows indicate the change in positioning of the lysine at residue position 84 upon mutation of residue position 88 from valine to tyrosine. When valine is present at position 88, the lysine residue (shown in green) primarily hydrogen bonds with an aspartate (labeled) in KRas. When valine is mutated to tyrosine (shown in cyan), the lysine at position 84 moves to make room for the tyrosine and positions itself to hydrogen bond with both the aspartate and the glutamate (labeled) in KRas.

Fig 8. Single-concentration experimental screening of c-Raf-RBD variants binding to KRas using BLI.

Fig 8

c-Raf-RBD variants at 250 nM were allowed to associate with KRasGppNHp immobilized on a Ni-NTA OctetRed96 BLI tip for 180 s and then dissociation was measured and fitted for 120 s. All dissociation fits were performed in a local 1:1 model and showed strong agreement with the data, every fit having greater than a R2 of 0.99 and a χ2 lower than 0.01. The fitted dissociation rate constant (kd (1/s)) is plotted versus the response rate for each variant. Each point is labeled with its corresponding variant boxed in the corresponding color. A triplicate repeat was performed for the c-Raf-RBD wild-type (WT) variant (red). Variants fall into three groups: variants similar to WT (T57K in blue, T57M in cyan, WT in red, K87Y in orange, and V88F in forest green), variants better than WT (A85K in pink, N71R in sand, and V88Y in black), and variants with a response more than twice as large as WT (RK in purple and RKY in green). These results were used as a screen with the most promising variants being studied further by full titration BLI experiments (see Fig 10). The corresponding BLI response curves for this experiment are presented in S1 Figure.

Fig 10. BLI titration experiments to calculate Kd values for select c-Raf-RBD variants. BLI titration experiments to calculate Kd values for select c-Raf-RBD variants.

Fig 10

The plots shown here are representative and the data from replicate experiments is presented in S4 Table along with curves in S2 and S3 Figures. Each plot shows the data collected from a titration BLI experiment where the concentration of the c-Raf-RBD variant is incrementally increased. The concentrations for the wild-type variant were 10, 50, 150, 200, and 300 nM. The concentrations for all of the other variants were 10, 25, 25, 75, 75, 125, and 200 nM. Repeat intermediate concentrations were used as loading controls. These curves were then fit using a mass transport model within the Octet Data Analysis HT software provided by FortéBio in order to calculate the Kd value for each variant’s binding to KRas. The values in the table here (bottom right) are average Kd values shown with 2 standard deviations calculated from replicate experiments (see S4 Table, S2 and S3 Figures). The values presented here for wild-type, A85K, and c-Raf-RBD(RK) agree well with previously reported Kd values [60]. The best binding variant, c-Raf-RBD(RKY), binds to KRas about 5 times better than the previous tightest-known binder, c-Raf-RBD(RK), and about 36 times better than the design starting point, wild-type c-Raf-RBD.

Experimental materials and methods

Each variant of c-Raf-RBD was expressed and purified (see S2 Text) with cysteine residues at positions 81 and 96 substituted for isoleucine and methionine, respectively. These mutations were previously reported to have a minimal affect on the stability of c-Raf-RBD [55] and their substitution allows for the use of the c-Raf-RBD constructs in other assays (not mentioned herein). Additionally, we do not believe these residue substitutions have a large effect since the Kd values determined herein align with previously reported Kd values [60] (see Fig 10). KRas was expressed and purified (see S3 Text) with a poly-histidine protein tag (His-tag) and loaded with a non-hydrolyzable GTP analog, GppNHp. KRas was also made to include a substitution at position 118 from cysteine to serine in order to increase expression and stability [75].

Bio-layer interferometry (BLI) dissociation rate and response screening

His-tagged KRasGppNHp was immobilized in nickel-nitrilotriacetic acid (Ni-NTA) biosensors tips and dipped into a single concentration of 250 nM for each c-Raf-RBD variant using an Octet Red96 instrument (FortéBio). All samples were previously diluted in kinetics buffer (PBS [pH 7.2], 0.01% [w/v] BSA, 0.002% [v/v] Tween 20) supplemented with 200 mM NaCl, 5 mM MgCl2 and 1 mM TCEP. After steady state was achieved for all samples, samples were allowed to dissociate in kinetics buffer (PBS [pH 7.2], 0.01% [w/v] BSA, 0.002% [v/v] Tween 20) supplemented with 200 mM NaCl 5mM MgCl2 and 1mM TCEP. A buffer blank and binding of c-Raf-RBD variants to Ni-NTA tips in the absence of KRasGppNHp were used as references for double subtraction. Curves (see Fig 8) were aligned on the y-axis to the average baseline, and an inter-step correction was aligned to the dissociation step. A dissociation only 1:1 binding model was used to fit the dissociation rate for a window of 120 s.

Bio-layer interferometry (BLI) titration assay

Binding of wild-type and variants of c-Raf-RBD were experimentally measured using a bio-layer interferometry (BLI) titration assay. Ni-NTA tips were then used to perform the BLI experiments to determine binding of the c-Raf-RBD variants to KRasGppNHp (results along with replicates are shown in Figs 8 and 10, S4 Table, S2 and S3 Figures). All experiments were carried out in 30 mM phosphate pH 7.4, 327 mM NaCl, 2.7 mM KCl, 5 mM MgCl2, 1.5 mM TCEP, 0.1% BSA, and 0.02% Tween-20 + Kathon at 25°C with 1000 RPM shaking and a KRas loading concentration of 20 μg/ml. Each curve presented (see Fig 10) was fit using the built-in mass transport model within the Octet Data Analysis HT software provided by FortéBio. We only accepted fits with a sum of square deviations χ2 less than 1 (FortéBio recommends a value less than 3) and a coefficient of determination R2 greater than 0.98.

Discussion

fries and EWAK* are new, provable algorithms for more efficient ensemble-based computational protein design. Efficiency and efficacy were tested and shown across a total of 2,826 different design problems. An implementation of fries/EWAK* is available in the open-source protein design software OSPREY [1] and all of the data has been made available (see Data Availability Statement). fries/EWAK* in combination achieved a significant runtime improvement over the previous state-of-the-art, BBK*, with runtimes up to 2 orders of magnitude faster. EWAK* also limits the number of minimized conformations used in each K* score approximation by up to about 2 orders of magnitude while maintaining provable guarantees (see the Section entitled “Energy Window Approximation to K* (EWAK*)”). fries alone is capable of reducing the input sequence space while provably keeping all of the most energetically favorable sequences (see the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”), decreasing the size of the sequence space by more than 2 orders of magnitude, and leading to more efficient design given the smaller search space.

To further validate osprey with fries/EWAK*, we applied these algorithms to a well-studied and biomedically interesting system: the c-Raf-RBD:KRas PPI. First, we performed a series of retrospective designs where fries/EWAK* accurately predicted how a variety of mutations affect the binding of c-Raf-RBD to KRasGTP that previous computational methods had failed to accurately predict [60]. This success supports the use of osprey and fries/EWAK* to evaluate the affect mutations in the protein-protein interface of c-Raf-RBD:KRas have on binding (more, similar successes of the K* algorithm are presented and discussed in [1]). fries/EWAK* also prospectively predicted the effect of new mutations in the c-Raf-RBD:KRas PPI and discovered a novel c-Raf-RBD mutation V88Y with improved affinity for KRas. We went on to combine this new mutation with two previously reported mutations, N71R and A85K [60], to create c-Raf-RBD(RKY), an even stronger binding c-Raf-RBD variant, which fries/EWAK* accurately predicted. We biochemically screened top predicted variants using an initial bio-layer interferometry (BLI) single-concentration assay. Only a promising subset of the computationally predicted and initially screened variants were then evaluated using a BLI titration assay to calculate Kd values for individual c-Raf-RBD variants. We determined that c-Raf-RBD(RKY) binds to KRasGTP roughly 36 times more tightly than wild-type c-Raf-RBD, making it the tightest known c-Raf-RBD variant binding partner of KRasGTP.

Given that numerous groups have explored this protein-protein interaction [4759] and performed mutagenesis on c-Raf-RBD either, through rational means [47, 48, 56, 69], computational methods [49, 60] or high-throughput evolutionary methods [55, 70] and that none identified V88Y, this discovery validates our computational approach and the use of computational algorithms such as fries and EWAK* to re-design protein-protein interfaces toward improved binding. Additionally, previous mutations that enhanced the affinity of c-Raf-RBD to KRas did so by supercharging c-Raf-RBD [48, 49, 60]. In contrast, our mutation V88Y introduces a novel, aromatic residue. The discovery that such a mutation can improve the binding of c-Raf-RBD to KRasGTP suggests that previous work has not completely explored the sequence space available to this binding interaction. These new c-Raf-RBD variants could be fused to cell-penetrating peptides and used as in-cell tools to further characterize KRas:effector signalling.

Supporting information

S1 Text. Homology model of c-Raf-RBD in complex with KRas.

(PDF)

S2 Text. Details of the expression and purification of c-Raf-RBD variants.

(PDF)

S3 Text. Details of the expression and purification of KRas.

(PDF)

S4 Text. Sequence and partition function approximation algorithms.

(PDF)

S5 Text. The relationship between the energy window stopping criterion and epsilon.

(PDF)

S1 Table. Protein structures used in computational experiments as described in the Section entitled “Computational materials and methods”.

Each protein structure has its PDB ID listed along with its molecule names as presented in the Protein Database entry for each structure. Individual designs are not listed or described here, but the necessary code and data is provided for the interested reader (see Data availability statement).

(PDF)

S2 Table. Experimental and computational percent change in binding and rankings.

For each listed c-Raf-RBD variant, we give the experimental percent change in KRas binding relative to wild-type c-Raf-RBD as reported in [48] (no Ka values were reported in [48] so the corresponding entries are left blank here) and as calculated from reported binding values in [54] and [60] (reported here as Exp. Kd), the EWAK* computationally predicted percent change in binding, the Δb values as described in the Section entitled “fries/EWAK* retrospectively predicted the affect mutations in c-Raf-RBD have on binding to KRas,” and the rankings that correspond to these values. The Δb values are calculated as follows: log10(%) − 2 where % represents the percent change in binding upon mutation. The rankings have a Pearson correlation of 0.81. The Pearson correlation between the change in binding values Δb is 0.64.

(PDF)

S3 Table. Table of computational predictions for point mutants in c-Raf-RBD.

Each section of the table shows the results of the redesign of a residue position in c-Raf-RBD in the c-Raf-RBD:KRas PPI in order of increasing upper bound on log(K*). The table contains the values for upper and lower bounds on log(K*) values (these bounds are described in detail in [32]). *Design results for the wild-type amino acid identity for each position. Mutations that were selected for experimental testing and validation.

(PDF)

S4 Table. Kd values for each tested variant for all replicates of BLI titration experiments.

For each listed variant, we give the dissociation constant Kd for each BLI titration experiment calculated from the fit done using the built-in mass transport model within the Octet Data Analysis HT software provided by FortéBio. We only accepted fits with a sum of square deviations χ2 less than 1 (FortéBio recommends a value less than 3) and a coefficient of determination R2 greater than 0.98. Presented in the table in Fig 10 are averages of these Kd values.

(PDF)

S1 Figure. Curves for single concentration BLI screen of c-Raf-RBD variants.

c-Raf-RBD variants at 250 nM were allowed to associate with KRasGppNHp immobilized on a Ni-NTA OctetRed96 BLI tip for 180 s and then dissociation was measured and fitted for 120 s. All dissociation fits were performed in a local 1:1 model and showed strong agreement with the data, every fit having greater than a R2 of 0.99 and a χ2 lower than 0.01. Each curve is labeled with its corresponding c-Raf-RBD variant boxed in the matching color. A triplicate repeat was performed for the c-Raf-RBD wild-type (WT) variant (Red). Curves grouped into three groups: variants similar to WT (T57K in blue, T57M in cyan, WT in red, K87Y in orange, and V88F in forest green), variants better than WT (A85K in pink, N71R in sand, and V88Y in black), and variants with a response greater than twice that of the WT (RK in purple and RKY in green).

(PDF)

S2 Figure. Replicate BLI titration curves of c-Raf-RBD(RKY) binding to immobilized KRas on NiNTA tips.

Titration experiments were conducted over different concentration ranges and for different association and dissociation times in order to avoid artifacts. Within each titration experiment, curves were fit globally to a mass transport model using the FortéBio Data Analysis HT software. All fits achieved an R2 greater than 0.99 and a χ2 smaller than 0.65. The two titration experiments on the left are replicates with concentrations ranging from 150 nM to 4.69 nM in a 2-fold serial dilution. The titration experiment on the top right has titrations ranging from 150 nM to 9.38 nM in a 2-fold serial dilution but with an extended association step. The titration in the bottom right contains binding curves with the following concentrations of c-Raf-RBD(RKY): 200 nM, 125 nM, 75 nM, 75 nM, 25 nM, 25 nM, and 10 nM. Note the in-experiment repetition of two concentrations (75 nM and 25 nM). This was done in order to control for response and curve shape within an experiment. Curves for the repeat concentrations show strong reproducibility and alternating what repeat curves are used for the global fit changes the Kd within a range of 1.99 nM to 2.34 nM. Results from these four titration experiments were averaged to generate a dissociation constant and standard deviation for c-Raf-RBD(RKY). Results are reported in the manuscript as the dissociation constant ± two standard deviations.

(PDF)

S3 Figure. Replicate BLI titration curves of c-Raf-RBD(RK) binding to immobilized KRas on NiNTA tips.

Titration experiments were conducted over different concentration ranges and for different association and dissociation times in order to avoid artifacts. Within each titration experiment, curves were fit globally to a mass transport model using the FortéBio Data Analysis HT software. All fits achieved an R2 greater than 0.98 and a χ2 smaller than 0.25. The titration experiment on the top left was done with the following concentrations of c-Raf-RBD(RK): 200 nM, 125 nM, 75 nM, 75 nM, 25 nM, 25 nM, and 10 nM. Note the in-experiment repetition of two concentrations (75 nM and 25 nM). This was done in order to control for response and curve shape within the experiment. Curves for the repeat concentrations show strong reproducibility and alternating what repeat curves are used for the global fit changes the Kd within a range of 15.1nM to 15.48nM. The bottom left and top right titration experiments are replicates with concentrations ranging from 150 nM to 4.69 nM in a 2-fold serial dilution. Results from these three titration experiments were averaged to generate a dissociation constant and standard deviation for c-Raf-RBD(RK). Results are reported in the manuscript as the dissociation constant ± two standard deviations.

(PDF)

Acknowledgments

We thank Rachel Kimbrough, Chanelle Simmons, Catherine Ehrhart, and Kelly Huynh for assistance with experiments and biochemical validation, Ben Fenton and Michael Kennedy for the initial KRas plasmids, and Paul Modrich for the Rosetta 2(DE3) cell lines. We also thank Terrence Oas and all members of the lab for helpful discussions.

Data Availability

All of the computational experiments and code used and discussed in this manuscript are available from the Harvard Dataverse repository (https://doi.org/10.7910/DVN/VHIRNM). For new empirical designs, we recommend using the latest version of OSPREY available for free at http://www.cs.duke.edu/donaldlab/osprey.php. All computer code for the OSPREY system is also available on GitHub at https://github.com/donaldlab/OSPREY3, and is open-source and free.

Funding Statement

This work is primarily support by the following grants from the National Institutes of Health (NIH): R01-GM078031 and R01-GM118543 to BRD. Website: www.nih.gov. In addition, AUL was partially supported by a PhRMA Foundation Pre Doctoral Fellowship in Informatics. Website: www.phrmafoundation.org. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Hallen MA, Martin JW, Ojewole A, Jou JD, Lowegard AU, Frenkel MS, et al. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. Journal of Computational Chemistry. 2018;39(30):2494–2507. 10.1002/jcc.25522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Ojewole A, Lowegard A, Gainza P, Reeve SM, Georgiev I, Anderson AC, et al. OSPREY predicts resistance mutations using positive and negative computational protein design In: Computational Protein Design. Springer; 2017. p. 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, et al. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol. 2013;523:87–107. 10.1016/B978-0-12-394292-0.00005-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Donald BR. Algorithms in Structural Molecular Biology. Cambridge, MA: MIT Press; 2011. [Google Scholar]
  • 5. Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Current Opinion in Structural Biology. 2016;39:16–26. 10.1016/j.sbi.2016.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T. Guaranteed Discrete Energy Optimization on Large Protein Design Problems. J Chem Theory Comput. 2015;11(12):5980–9. [DOI] [PubMed] [Google Scholar]
  • 7.Hallen MA, Donald BR. Protein Design by Algorithm. arXiv preprint arXiv:180606064. 2018.
  • 8. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A. 2000;97(19):10383–8. 10.1073/pnas.97.19.10383 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–74. 10.1016/B978-0-12-381270-4.00019-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lee C, Subbiah S. Prediction of protein side-chain conformation by packing optimization. Journal of Molecular Biology. 1991;217(2):373–388. [DOI] [PubMed] [Google Scholar]
  • 11. Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins. 2000;40(3):389–408. [PubMed] [Google Scholar]
  • 12. Georgiev I, Lilien RH, Donald BR. The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem. 2008;29(10):1527–42. 10.1002/jcc.20909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Gainza P, Roberts KE, Donald BR. Protein design using continuous rotamers. PLoS Comput Biol. 2012;8(1):e1002335 10.1371/journal.pcbi.1002335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Hallen MA, Gainza P, Donald BR. Compact Representation of Continuous Energy Surfaces for More Efficient Protein Design. J Chem Theory Comput. 2015;11(5):2292–306. 10.1021/ct501031m [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid-rotamer-like Efficiency. Research in Computational Molecular Biology (RECOMB). 2016;9649:122–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Georgiev I, Donald BR. Dead-end elimination with backbone flexibility. Bioinformatics. 2007;23(13):i185–94. [DOI] [PubMed] [Google Scholar]
  • 17. Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for backrub motions in protein design. Bioinformatics. 2008;24(13):i196–204. 10.1093/bioinformatics/btn169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics. 2017;33(14):i5–i12. 10.1093/bioinformatics/btx277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins. 2013;81(1):18–39. 10.1002/prot.24150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Tzeng SR, Kalodimos CG. Protein activity regulation by conformational entropy. Nature. 2012;488(7410):236 [DOI] [PubMed] [Google Scholar]
  • 21. Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys J. 1997;72(3):1047–69. 10.1016/S0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Chen CY, Georgiev I, Anderson AC, Donald BR. Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci USA. 2009;106(10):3764–9. 10.1073/pnas.0900266106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Sciretti D, Bruscolini P, Pelizzola A, Pretti M, Jaramillo A. Computational protein design with side-chain conformational entropy. Proteins. 2009;74(1):176–91. [DOI] [PubMed] [Google Scholar]
  • 24. Georgiev I, Lilien RH, Donald BR. Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design. Bioinformatics. 2006;22(14):e174–83. [DOI] [PubMed] [Google Scholar]
  • 25. Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997;278(5335):82–7. [DOI] [PubMed] [Google Scholar]
  • 26. Leach AR, Lemon AP. Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins. 1998;33(2):227–39. [DOI] [PubMed] [Google Scholar]
  • 27. Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, et al. A new framework for computational protein design through cost function network optimization. Bioinformatics. 2013;29(17):2129–36. [DOI] [PubMed] [Google Scholar]
  • 28. Chazelle B, Kingsford C, Singh M. A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies. INFORMS Journal on Computing. 2004;16(4):380–392. [Google Scholar]
  • 29. Lilien RH, Stevens BW, Anderson AC, Donald BR. A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J Comput Biol. 2005;12(6):740–61. [DOI] [PubMed] [Google Scholar]
  • 30. Roberts KE, Cushing PR, Boisguerin P, Madden DR, Donald BR. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput Biol. 2012;8(4):e1002477 10.1371/journal.pcbi.1002477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Silver NW, King BM, Nalam MNL, Cao H, Ali A, Kiran Kumar Reddy GS, et al. Efficient Computation of Small-Molecule Configurational Binding Entropy and Free Energy Changes by Ensemble Enumeration. J Chem Theory Comput. 2013;9(11):5098–5115. 10.1021/ct400383v [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ojewole AA, Jou JD, Fowler VG, Donald BR. BBK* (Branch and Bound over K*): A Provable and Efficient Ensemble-Based Algorithm to Optimize Stability and Binding Affinity over Large Sequence Spaces. Research in Computational Molecular Biology (RECOMB). 2017; p. 157–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Viricel C, Simoncini D, Barbe S, Schiex T. Guaranteed weighted counting for affinity computation: Beyond determinism and structure. In: International Conference on Principles and Practice of Constraint Programming. Springer; 2016. p. 733–750.
  • 34. Traoré S, Allouche D, André I, Schiex T, Barbe S. Deterministic Search Methods for Computational Protein Design. Methods Mol Biol. 2017;1529:107–123. [DOI] [PubMed] [Google Scholar]
  • 35. Traoré S, Roberts KE, Allouche D, Donald BR, André I, Schiex T, et al. Fast search algorithms for computational protein design. J Comput Chem. 2016;37(12):1048–58. 10.1002/jcc.24290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Stevens BW, Lilien RH, Georgiev I, Donald BR, Anderson AC. Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry. 2006;45(51):15495–504. [DOI] [PubMed] [Google Scholar]
  • 37. Frey KM, Georgiev I, Donald BR, Anderson AC. Predicting resistance mutations using protein design algorithms. Proc Natl Acad Sci U S A. 2010;107(31):13707–12. 10.1073/pnas.1002162107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Reeve SM, Gainza P, Frey KM, Georgiev I, Donald BR, Anderson AC. Protein design algorithms predict viable resistance to an experimental antifolate. Proc Natl Acad Sci U S A. 2015;112(3):749–54. 10.1073/pnas.1411548112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Gorczynski MJ, Grembecka J, Zhou Y, Kong Y, Roudaia L, Douvas MG, et al. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem Biol. 2007;14(10):1186–97. [DOI] [PubMed] [Google Scholar]
  • 40. Georgiev I, Schmidt S, Li Y, Wycuff D, Ofek G, Doria-Rose N, et al. Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology. 2012;9. [Google Scholar]
  • 41. Georgiev IS, Rudicell RS, Saunders KO, Shi W, Kirys T, McKee K, et al. Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J Immunol. 2014;192(3):1100–6. 10.4049/jimmunol.1302515 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Rudicell RS, Kwon YD, Ko SY, Pegu A, Louder MK, Georgiev IS, et al. Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J Virol. 2014;88(21):12669–82. 10.1128/JVI.02213-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.A Phase 1, Single Dose Study of the Safety and Virologic Effect of an HIV-1 Specific Broadly Neutralizing Human Monoclonal Antibody, VRC-HIVMAB080-00-AB (VRC01LS) or VRC-HIVMAB075-00-AB (VRC07-523LS), Administered Intravenously to HIV-Infected Adults. ClinicalTrials.gov Identifier: NCT02840474. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT02840474;.
  • 44.Evaluating the Safety and Serum Concentrations of a Human Monoclonal Antibody, VRC-HIVMAB075-00-AB (VRC07-523LS), Administered in Multiple Doses and Routes to Healthy, HIV-uninfected Adults. ClinicalTrials.gov Identifier: NCT03387150. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT03387150;.
  • 45.VRC 610: Phase I Safety and Pharmacokinetics Study to Evaluate a Human Monoclonal Antibody (MAB) VRC-HIVMAB095-00-AB (10E8VLS) Administered Alone or Concurrently With MAB VRC-HIVMAB075-00-AB (VRC07-523LS) Via Subcutaneous Injection in Healthy Adults. ClinicalTrials.gov Identifier: NCT03565315. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT03565315;.
  • 46. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences. 2000;97(19):10383–10388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Nassar N, Horn G, Herrmann C, Block C, Janknecht R, Wittinghofer A. Ras/Rap effector specificity determined by charge reversal. Nature Structural and Molecular Biology. 1996;3(8):723. [DOI] [PubMed] [Google Scholar]
  • 48. Fridman M, Maruta H, Gonez J, Walker F, Treutlein H, Zeng J, et al. Point mutants of c-raf-1 RBD with elevated binding to v-Ha-Ras. Journal of Biological Chemistry. 2000;275(39):30363–30371. [DOI] [PubMed] [Google Scholar]
  • 49. Kiel C, Filchtinski D, Spoerner M, Schreiber G, Kalbitzer HR, Herrmann C. Improved binding of Raf to Ras·GDP is correlated with biological activity. Journal of Biological Chemistry. 2009;284(46):31893–31902. 10.1074/jbc.M109.031153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Sydor JR, Seidel RP, Goody RS, Engelhard M. Cell-free synthesis of the Ras-binding domain of c-Raf-1: binding studies to fluorescently labelled H-Ras. FEBS letters. 1999;452(3):375–378. [DOI] [PubMed] [Google Scholar]
  • 51. Herrmann C, Horn G, Spaargaren M, Wittinghofer A. Differential interaction of the ras family GTP-binding proteins H-Ras, Rap1A, and R-Ras with the putative effector molecules Raf kinase and Ral-guanine nucleotide exchange factor. Journal of Biological Chemistry. 1996;271(12):6794–6800. [DOI] [PubMed] [Google Scholar]
  • 52. Herrmann C, Martin GA, Wittinghofer A. Quantitative analysis of the complex between p21 and the ras-binding domain of the human raf-1 protein kinase. Journal of Biological Chemistry. 1995;270(7):2901–2905. [DOI] [PubMed] [Google Scholar]
  • 53. Lakshman B, Messing S, Schmid EM, Clogston JD, Gillette WK, Esposito D, et al. Quantitative biophysical analysis defines key components modulating recruitment of the GTPase KRAS to the plasma membrane. Journal of Biological Chemistry. 2019;294(6):2193–2207. 10.1074/jbc.RA118.005669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Block C, Janknecht R, Herrmann C, Nassar N, Wittinghofer A. Quantitative structure-activity analysis correlating Ras/Raf interaction in vitro to Raf activation in vivo. Nature structural biology. 1996;3(3):244 [DOI] [PubMed] [Google Scholar]
  • 55. Campbell-Valois FX, Tarassov K, Michnick S. Massive sequence perturbation of the Raf Ras binding domain reveals relationships between sequence conservation, secondary structure propensity, hydrophobic core organization and stability. Journal of molecular biology. 2006;362(1):151–171. [DOI] [PubMed] [Google Scholar]
  • 56. Fridman M, Walker F, Catimel B, Domagala T, Nice E, Burgess A. c-Raf-1 RBD associates with a subset of active vH-Ras. Biochemistry. 2000;39(50):15603–15611. [DOI] [PubMed] [Google Scholar]
  • 57. Fetics SK, Guterres H, Kearney BM, Buhrman G, Ma B, Nussinov R, et al. Allosteric effects of the oncogenic RasQ61L mutant on Raf-RBD. Structure. 2015;23(3):505–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Gorman C, Skinner RH, Skelly JV, Neidle S, Lowe PN. Equilibrium and kinetic measurements reveal rapidly reversible binding of Ras to Raf. Journal of Biological Chemistry. 1996;271(12):6713–6719. [DOI] [PubMed] [Google Scholar]
  • 59. Hunter JC, Manandhar A, Carrasco MA, Gurbani D, Gondi S, Westover KD. Biochemical and structural analysis of common cancer-associated KRAS mutations. Molecular cancer research. 2015;13(9):1325–1335. [DOI] [PubMed] [Google Scholar]
  • 60. Filchtinski D, Sharabi O, Rüppel A, Vetter IR, Herrmann C, Shifman JM. What makes Ras an efficient molecular switch: a computational, biophysical, and structural study of Ras-GDP interactions with mutants of Raf. Journal of molecular biology. 2010;399(3):422–435. [DOI] [PubMed] [Google Scholar]
  • 61. Lee J. New Monte Carlo algorithm: entropic sampling. Physical Review Letters. 1993;71(2):211 [DOI] [PubMed] [Google Scholar]
  • 62. Nosé S. A molecular dynamics method for simulations in the canonical ensemble. Molecular physics. 1984;52(2):255–268. [Google Scholar]
  • 63. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970. [Google Scholar]
  • 64.Lou Q, Dechter R, Ihler AT. Dynamic Importance Sampling for Anytime Bounds of the Partition Function. In: Advances in Neural Information Processing Systems; 2017. p. 3196–3204.
  • 65. Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins. 2015;83(10):1859–77. 10.1002/prot.24870 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Hallen MA, Donald BR. COMETS (Constrained Optimization of Multistate Energies by Tree Search): A provable and efficient protein design algorithm to optimize binding affinity and specificity with respect to sequence. Journal of Computational Biology. 2016;23(5):311–321. 10.1089/cmb.2015.0188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Sommer R, Wagner S, Varrot A, Nycholat CM, Khaledi A, Häussler S, et al. The virulence factor LecB varies in clinical isolates: consequences for ligand binding and drug discovery. Chemical Science. 2016;7(8):4990–5001. 10.1039/c6sc00696e [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Jou JD, Holt GT, Lowegard AU, Donald BR. Minimization-Aware Recursive K* (MARK*): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. In: International Conference on Research in Computational Molecular Biology. Springer; 2019. p. 101–119. [DOI] [PMC free article] [PubMed]
  • 69. Fridman M, Tikoo A, Varga M, Murphy A, Nur-E-Kamal M, Maruta H. The minimal fragments of c-Raf-1 and NF1 that can suppress v-Ha-Ras-induced malignant phenotype. Journal of Biological Chemistry. 1994;269(48):30105–30108. [PubMed] [Google Scholar]
  • 70. Campbell-Valois FX, Tarassov K, Michnick S. Massive sequence perturbation of a small protein. Proceedings of the National Academy of Sciences. 2005;102(42):14988–14993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Roberts KE. http://www.cs.duke.edu/donaldlab/software/proteinInteractionViewer/. Protein Interaction Viewer. 2012.
  • 72. Ylera F, Harth S, Waldherr D, Frisch C, Knappik A. Off-rate screening for selection of high-affinity anti-drug antibodies. Analytical biochemistry. 2013;441(2):208–213. [DOI] [PubMed] [Google Scholar]
  • 73. Perspicace S, Banner D, Benz J, Müller F, Schlatter D, Huber W. Fragment-based screening using surface plasmon resonance technology. Journal of biomolecular screening. 2009;14(4):337–349. [DOI] [PubMed] [Google Scholar]
  • 74. Lad L, Clancy S, Kovalenko M, Liu C, Hui T, Smith V, et al. High-throughput kinetic screening of hybridomas to identify high-affinity antibodies using bio-layer interferometry. Journal of biomolecular screening. 2015;20(4):498–507. [DOI] [PubMed] [Google Scholar]
  • 75. Sun Q, Burke JP, Phan J, Burns MC, Olejniczak ET, Waterson AG, et al. Discovery of small molecules that bind to K-Ras and inhibit Sos-mediated activation. Angewandte Chemie International Edition. 2012;51(25):6140–6143. 10.1002/anie.201201358 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007447.r001

Decision Letter 0

Roland L Dunbrack Jr, Nir Ben-Tal

6 Dec 2019

Dear Dr Donald,

Thank you very much for submitting your manuscript 'Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

As you can see, the two reviewers have quite contrasting views of the paper. If you decide to submit a revision, it is likely that we will involve a third reviewer. The first reviewer has valid criticisms of the experimental data and suggests additional computational benchmarking designed to test what the method is actually trying to achieve.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Roland L. Dunbrack Jr., Ph.D.

Associate Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Lowegard et al. present a development of the Donald lab Osprey design methodology to improve efficiency in the design of large sequence ensembles. If I understood correctly, Osprey carries out a combined sequence-design and backbone relaxation step and therefore requires enumerating sequence space. To reduce the enumerated space to a size that can be practically computed, they now compute the stability of each component of the protein system (for instance, receptor and ligand) to ensure that no mutant is too destabilising before considering all combined mutations. The authors tested their approach on a natural (and engineered) PPI involving KRas and retrospectively tested the ranking of mutants relative to previous mutational analyses. They also used the method to design a single-point mutant and found that it improved affinity fivefold relative to the starting point.

Major concerns:

1. The main message of the paper is rather difficult to distill. I wrote above how I understood the motivation for the current paper and how the authors addressed the computational problem, but this not stated in the abstract and it was only by going through the methods that I could understand this (and I'm still not sure that I understood correctly). The methods are, as expected for such a paper, quite dense with formulae so the big picture is lost on a reader. My first suggestion is to clarify already in the abstract what is the main contribution of this paper, not just by stating that the method is more efficient, but why it is more efficient and for which problems. In this connection, the provability of the method is of far less significance to most users and developers of computational design methodology than its practical usefulness (accuracy, speed, scope). Nevertheless, the point about provability is repeated from the second word of the title to the end of the paper in the excess of 30 times. My second suggestion is therefore to substantially reduce the emphasis from this point and highlight applicability and accuracy.

2. The authors chose the KRas system apparently because it is a drug target. They mention several times that KRas is undruggable and that this is therefore a biomedically significant problem. First, as the authors mention at one point in the text, KRas is actually not undruggable and there are now small molecules in clinical trials (or in the clinic; I'm not sure). Second, whether or not KRas is a drug target is beside the point for this paper, since KRas is an intracellular target and no matter what affinities the authors achieved, the designs could not be used in any conceivable way in treatments or even in drug discovery. I therefore recommend that they drop all reference to druggability and concentrate on the truly important aspect, which is that KRas has been studied extensively as a model PPI, thus providing an excellent data set for retrospective analysis.

3. The narrative of this study is confusing: the authors developed new methods to enable design within large sequence spaces, but in the end, they validated their method by testing it retrospectively against a predetermined set of mutants and in the prospective design of a single-point mutation. It seems that any molecular threading approach and the simplest mutational scanning method (FOLDX?) would be just as useful for this analysis as the newly developed method. This is, in my opinion, a critical point. I don't think that the validation presented here is sufficient and instead, the authors should show that in the design of large ensembles, they recapitulate known sequence signatures (for instance, natural sequence alignments), and they should do this for more than one protein. Such a benchmark would show that the method is indeed fast enough to be practically useful for large sequence spaces and that it yields more than anecdotal successes.

4. Regarding the single-point mutant that the method prospectively designed. This mutant showed at most fivefold improvement in affinity over the starting point engineered variant. This is quite modest to say the least, though the authors trumpet this result as "a discovery of some significance". The tone relating to this mutant should be reduced throughout the paper.

5. The BLI experimental methods are described far too briefly to understand what was actually done and what fits are reported. It is not clear what model was used to fit the data. It looks to me like the authors may have fitted a kinetic model but I'm not sure whether they fitted each curve to a separate kinetic model or all curves at once (the latter is the correct way of analysing these data). The fits to the RK variant (Fig10) are quite poor. Considering that they have 6 independent measurements, and 4 of them show poor fits, I think that this experiment needs to be redone and since this serves as the baseline measurement to judge the impact of the V-->Y designed mutation, this is quite important. I also suggest to draw the fits in black because some of the traces are quite close in tone to the red used to show the fits.

6. Also, typically when such modest improvements in affinity are reported (fivefold), and given that the fits are rather poor in the data that were presented, it's important to provide experimental replicates. It is difficult to say whether the fivefold effect is real or within the noise of the experimental setup.

Minor comments:

1. Eq. 2: define C,P,L

2. On pg. 17, the authors use Spearman correlations to check the rank order correlation in 38 mutations. Since the computational method reports energies and the binding experiments report KDs, why not use Pearson correlations?

3. Line 403: the authors state that they filtered mutations based on "promising K* scores and structure examination". It's important that they provide some guidelines about how they selected the mutation. This explanation cannot be reproduced by anyone.

4. The single-concentration BLI measurements are very unusual (Fig8). I'm not sure that they are meaningful at all since at a single concentration it's impossible to determine Kd and it's not very obvious what is the signal one measures. I recommend to drop this analysis as it is misleading, especially since the authors draw conclusions on the correlation between these experimental results and the computational analysis.

5. Lines 433-437: The authors make a lot of the fivefold improvement that the single point mutation exhibited. It's a quite modest improvement and seems almost anecdotal given that no other mutants were presented.

6. Lines 439-452: this belongs in methods.

7. 475: "we applied these algorithms to a biomedically significant design problem": this is not a biomedically significant design problem, because the design cannot be used in a biomedical context. KRas is biomedically significant drug target but the design is in no way a starting point for a drug. Again, the authors can highlight the datasets that are available for KRas but not the druggability aspects as they have no relevance to the work.

8. 502: "the discovery that such a mutation can improve the binding...is of considerable significance...eventually developing successful therapeutics". This is quite spurious. They tested a single mutation which exhibited a very modest effect and has no relevance to drug discovery.

Reviewer #2: The authors present two algorithmic improvements for the K* algorithm for predicting binding affinities at protein/protein interfaces, and then show that their improved algorithm is capable of accurately predicting binding energies with a retrospective analysis of mutations at the cRaf-RBD / KRas interface and by predicting several point mutations that improve binding relative to wild-type, including one mutant, V88Y, that when paired with two previously-reported mutations, creates the tightest-yet-known interface between these two proteins. The paper spans from theory to the bench and is an impressive piece of work.

The two algorithmic improvements the authors describe focus on the two levels at which the K* algorithm broaches an exponential amount of work, 1) at the conformation level, and 2) at the sequence level.

At the conformation level, the K* algorithm is designed to approximate K = q_pl / ( q_p x q_l ) by enumerating all conformations of a single sequence in order of increasing energy until the conformations remaining are at a high-enough energy that their contribution to the partition function is small -- small enough that the resulting K* approximation is within a user-provided constant, epsilon, of the full K. To improve performance here, the authors introduce the EWAK* algorithm. EWAK* instead of enumerating conformations until the epsilon-error threshold is reached, they stop when the bound on the energy of the next conformation is larger than some user-provided deltaE of the best conformation. The authors contrast EWAK* with the previous algorithm, BBK*, and note that it produces significant performance improvements.

I found this section a little difficult to understand: A) The algorithmic improvement presented in the BBK* paper from 2017 focuses entirely on sequence-space pruning and not conformation-space pruning, and so EWAK* seems like it ought to be compared to K* (or that the text should say that BBK* and K* are the same for the sake of this comparison). B) It would be nice to understand how the user providing an energy threshold "1 kcal/mol" differs from the user changing epsilon to ".05" from "0.01". Is there a simple mathematical relationship between these numbers or are they related but not comparable?

2) At the sequence level, the authors present an algorithm, FRIES, for removing sequences that are higher in energy than the wild type sequence. Here both FRIES and BBK* enumerate sequences in order of decreasing bounds on their energies. With FRIES, the intuition on when to stop is that, after the wild-type sequence is encountered at an interface where one is looking for tighter binding, then there is not much point in continuing.

I found this section slightly confusing because it's not clear whether the energies are for the complex structure, the unbound structure, or both. It's also unclear why FRIES searches the multi-sequence tree until the wild type sequence is hit instead of going directly to the (clearly known!) wild type sequence. I believe what the authors mean to describe is that FRIES searches the multi-sequence tree and descends into the single-sequence conformation tree for each sequence it encounters _until_ it hits the wild type sequence, after which point it begins looking to stop sequence enumeration.

I would also be curious to understand why the authors chose to approximate q^(-)_wt using only a single conformation of the wild-type sequence instead of trying to estimate q*_wt accurately; the wild type sequence is just one more sequence among the very many sequences that FRIES and BBK* would enumerate.

Minor point: on page 5 the phrase "by up to more than 2 orders of magnitude" confuses me; is it "by up to 2 orders of magnitude" or "more than 2 orders of magnitude"?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: None

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007447.r003

Decision Letter 1

Roland L Dunbrack Jr, Nir Ben-Tal

11 Apr 2020

Dear Dr. Donald,

Thank you very much for submitting your manuscript "Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Both reviewers recommended publication. The second reviewer had some concerns about whether all the necessary data are presented in the paper. S/he suggests some work that might be outside the scope of the paper, and we think that does not need to be included in your final revision. But please respond to the other suggestions in a minor revision.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roland L. Dunbrack Jr., Ph.D.

Associate Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Some of the criticisms I raised before are still relevant to this revision: prospective validation is done on just one single-point mutant and the results are modest compared to data presented in recent years for protein design methods. Nevertheless, the authors made a very serious effort to clarify the message and provide more detail on the calculations and the experiments while also toning down some of the language that seemed be carried away in the original submission.

I think that the major contribution of this paper is in the very extensive theoretical treatment. Further experimental validation including, possibly, a side-by-side comparison of this method with others may provide the answer to the questions I raised on the method's scope.

In summary, I would like to congratulate the authors on this work and also on their sincere efforts to address my previous comments which may have been phrased too harshly (my apologies for that).

Reviewer #2: I very much like this paper and would like to see it published soon. I think there are a handful of relatively small things that should be addressed that do not necessarily require another round of review.

I think the objections made by the first reviewer to the original submission were well deserved and that the changes that were made to the text in response have improved the manuscript considerably. Removing the repetitive emphasis on provability made the paper much more enjoyable to read.

I also think the point about using Spearman's rho instead of Pearson's r is still important and was not well addressed in the revision: the authors are not reporting r presumably because the T68K outlier makes r look quite bad. I can only guess that the authors suspect a reader would distill the paper to a single r value and look past Osprey as a tool for their project. I wanted to compute an r value myself looking to the data in Supp Table 3, but while the upper and lower bounds on log(K) for each mutation + wt is given, that does not readily translate into the delta-b value that is presented in Figure 5. Another column for delta-b would be nice. Also missing from Supp Table 3 is the WT dG values (where available) that a reader would need to compute r accurately.

A paper where the authors look into how well K* with Osprey's energy function performs compared to other techniques for binding energy calculations (e.g. FoldX) is definitely needed, but perhaps is beyond the scope of this work. On this topic, I would also add that I find the sentence explaining why K* does not do a good job predicting absolute values of binding energies

"since our current designs likely underestimate entropic contributions to binding upon mutation due to various limitations in biological modeling"

to be very unsatisfying! I was under the impression that K* would do a much better job than other techniques for estimating mutation ddGs because it considers side-chain entropic contributions explicitly. If entropy is poorly estimated by EWAK*, wouldn't it be estimated even more poorly by other molecular modeling applications? Finally, what aspect of entropy do you suspect is under-considered? ("various" doesn't help me). One question that perhaps could be addressed in a subsequent ddG comparison paper is: how well does K* (including FRIES and EWAK) perform using a different energy function? (Can the energy function that Osprey is using be swapped out with another energy function such as the one used in FoldX?)

I appreciate the clarity that was added to the algorithm description, especially with regard to fact that the partition functions of the three species are approximated separately, and with the energy bound based on the WT sequence that FRIES uses to prune sequences.

The revision was missing almost all of the figures -- only figure 8 of the non-supplemental-figures is included. I had to go back to the original manuscript to find the others.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: None

Reviewer #2: No: The data needed to reconstruct figure 5 is not fully present in supplemental table 3. It is possible for the reader to hunt down all of the previously reported experimentally measured ddGs from the previous literature, but it would be nice for the authors to simply include these numbers. It is less clear how one goes from the upper and lower bounds on the log(K) to the delta-b values that the authors use.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007447.r005

Decision Letter 2

Roland L Dunbrack Jr, Nir Ben-Tal

13 May 2020

Dear Dr. Donald,

We are pleased to inform you that your manuscript 'Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface' has been provisionally accepted for publication in PLOS Computational Biology.

Thank you for taking into account the comments made by the reviewers on the revised manuscript, and especially for providing the additional data requested.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Roland L. Dunbrack Jr., Ph.D.

Associate Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007447.r006

Acceptance letter

Roland L Dunbrack Jr, Nir Ben-Tal

2 Jun 2020

PCOMPBIOL-D-19-01654R2

Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface

Dear Dr Donald,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Homology model of c-Raf-RBD in complex with KRas.

    (PDF)

    S2 Text. Details of the expression and purification of c-Raf-RBD variants.

    (PDF)

    S3 Text. Details of the expression and purification of KRas.

    (PDF)

    S4 Text. Sequence and partition function approximation algorithms.

    (PDF)

    S5 Text. The relationship between the energy window stopping criterion and epsilon.

    (PDF)

    S1 Table. Protein structures used in computational experiments as described in the Section entitled “Computational materials and methods”.

    Each protein structure has its PDB ID listed along with its molecule names as presented in the Protein Database entry for each structure. Individual designs are not listed or described here, but the necessary code and data is provided for the interested reader (see Data availability statement).

    (PDF)

    S2 Table. Experimental and computational percent change in binding and rankings.

    For each listed c-Raf-RBD variant, we give the experimental percent change in KRas binding relative to wild-type c-Raf-RBD as reported in [48] (no Ka values were reported in [48] so the corresponding entries are left blank here) and as calculated from reported binding values in [54] and [60] (reported here as Exp. Kd), the EWAK* computationally predicted percent change in binding, the Δb values as described in the Section entitled “fries/EWAK* retrospectively predicted the affect mutations in c-Raf-RBD have on binding to KRas,” and the rankings that correspond to these values. The Δb values are calculated as follows: log10(%) − 2 where % represents the percent change in binding upon mutation. The rankings have a Pearson correlation of 0.81. The Pearson correlation between the change in binding values Δb is 0.64.

    (PDF)

    S3 Table. Table of computational predictions for point mutants in c-Raf-RBD.

    Each section of the table shows the results of the redesign of a residue position in c-Raf-RBD in the c-Raf-RBD:KRas PPI in order of increasing upper bound on log(K*). The table contains the values for upper and lower bounds on log(K*) values (these bounds are described in detail in [32]). *Design results for the wild-type amino acid identity for each position. Mutations that were selected for experimental testing and validation.

    (PDF)

    S4 Table. Kd values for each tested variant for all replicates of BLI titration experiments.

    For each listed variant, we give the dissociation constant Kd for each BLI titration experiment calculated from the fit done using the built-in mass transport model within the Octet Data Analysis HT software provided by FortéBio. We only accepted fits with a sum of square deviations χ2 less than 1 (FortéBio recommends a value less than 3) and a coefficient of determination R2 greater than 0.98. Presented in the table in Fig 10 are averages of these Kd values.

    (PDF)

    S1 Figure. Curves for single concentration BLI screen of c-Raf-RBD variants.

    c-Raf-RBD variants at 250 nM were allowed to associate with KRasGppNHp immobilized on a Ni-NTA OctetRed96 BLI tip for 180 s and then dissociation was measured and fitted for 120 s. All dissociation fits were performed in a local 1:1 model and showed strong agreement with the data, every fit having greater than a R2 of 0.99 and a χ2 lower than 0.01. Each curve is labeled with its corresponding c-Raf-RBD variant boxed in the matching color. A triplicate repeat was performed for the c-Raf-RBD wild-type (WT) variant (Red). Curves grouped into three groups: variants similar to WT (T57K in blue, T57M in cyan, WT in red, K87Y in orange, and V88F in forest green), variants better than WT (A85K in pink, N71R in sand, and V88Y in black), and variants with a response greater than twice that of the WT (RK in purple and RKY in green).

    (PDF)

    S2 Figure. Replicate BLI titration curves of c-Raf-RBD(RKY) binding to immobilized KRas on NiNTA tips.

    Titration experiments were conducted over different concentration ranges and for different association and dissociation times in order to avoid artifacts. Within each titration experiment, curves were fit globally to a mass transport model using the FortéBio Data Analysis HT software. All fits achieved an R2 greater than 0.99 and a χ2 smaller than 0.65. The two titration experiments on the left are replicates with concentrations ranging from 150 nM to 4.69 nM in a 2-fold serial dilution. The titration experiment on the top right has titrations ranging from 150 nM to 9.38 nM in a 2-fold serial dilution but with an extended association step. The titration in the bottom right contains binding curves with the following concentrations of c-Raf-RBD(RKY): 200 nM, 125 nM, 75 nM, 75 nM, 25 nM, 25 nM, and 10 nM. Note the in-experiment repetition of two concentrations (75 nM and 25 nM). This was done in order to control for response and curve shape within an experiment. Curves for the repeat concentrations show strong reproducibility and alternating what repeat curves are used for the global fit changes the Kd within a range of 1.99 nM to 2.34 nM. Results from these four titration experiments were averaged to generate a dissociation constant and standard deviation for c-Raf-RBD(RKY). Results are reported in the manuscript as the dissociation constant ± two standard deviations.

    (PDF)

    S3 Figure. Replicate BLI titration curves of c-Raf-RBD(RK) binding to immobilized KRas on NiNTA tips.

    Titration experiments were conducted over different concentration ranges and for different association and dissociation times in order to avoid artifacts. Within each titration experiment, curves were fit globally to a mass transport model using the FortéBio Data Analysis HT software. All fits achieved an R2 greater than 0.98 and a χ2 smaller than 0.25. The titration experiment on the top left was done with the following concentrations of c-Raf-RBD(RK): 200 nM, 125 nM, 75 nM, 75 nM, 25 nM, 25 nM, and 10 nM. Note the in-experiment repetition of two concentrations (75 nM and 25 nM). This was done in order to control for response and curve shape within the experiment. Curves for the repeat concentrations show strong reproducibility and alternating what repeat curves are used for the global fit changes the Kd within a range of 15.1nM to 15.48nM. The bottom left and top right titration experiments are replicates with concentrations ranging from 150 nM to 4.69 nM in a 2-fold serial dilution. Results from these three titration experiments were averaged to generate a dissociation constant and standard deviation for c-Raf-RBD(RK). Results are reported in the manuscript as the dissociation constant ± two standard deviations.

    (PDF)

    Attachment

    Submitted filename: RTR.pdf

    Attachment

    Submitted filename: RTR_3.pdf

    Data Availability Statement

    All of the computational experiments and code used and discussed in this manuscript are available from the Harvard Dataverse repository (https://doi.org/10.7910/DVN/VHIRNM). For new empirical designs, we recommend using the latest version of OSPREY available for free at http://www.cs.duke.edu/donaldlab/osprey.php. All computer code for the OSPREY system is also available on GitHub at https://github.com/donaldlab/OSPREY3, and is open-source and free.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES