Abstract
The K* algorithm provably approximates partition functions for a set of states (e.g., protein, ligand, and protein-ligand complex) to a user-specified accuracy ε. Often, reaching an ε-approximation for a particular set of partition functions takes a prohibitive amount of time and space. To alleviate some of this cost, we introduce two new algorithms into the osprey suite for protein design: fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. fries pre-processes the sequence space to limit a design to only the most stable, energetically favorable sequence possibilities. EWAK* then takes this pruned sequence space as input and, using a user-specified energy window, calculates K* scores using the lowest energy conformations. We expect fries/EWAK* to be most useful in cases where there are many unstable sequences in the design sequence space and when users are satisfied with enumerating the low-energy ensemble of conformations. In combination, these algorithms provably retain calculational accuracy while limiting the input sequence space and the conformations included in each partition function calculation to only the most energetically favorable, effectively reducing runtime while still enriching for desirable sequences. This combined approach led to significant speed-ups compared to the previous state-of-the-art multi-sequence algorithm, BBK*, while maintaining its efficiency and accuracy, which we show across 40 different protein systems and a total of 2,826 protein design problems. Additionally, as a proof of concept, we used these new algorithms to redesign the protein-protein interface (PPI) of the c-Raf-RBD:KRas complex. The Ras-binding domain of the protein kinase c-Raf (c-Raf-RBD) is the tightest known binder of KRas, a protein implicated in difficult-to-treat cancers. fries/EWAK* accurately retrospectively predicted the effect of 41 different sets of mutations in the PPI of the c-Raf-RBD:KRas complex. Notably, these mutations include mutations whose effect had previously been incorrectly predicted using other computational methods. Next, we used fries/EWAK* for prospective design and discovered a novel point mutation that improves binding of c-Raf-RBD to KRas in its active, GTP-bound state (KRasGTP). We combined this new mutation with two previously reported mutations (which were highly-ranked by osprey) to create a new variant of c-Raf-RBD, c-Raf-RBD(RKY). fries/EWAK* in osprey computationally predicted that this new variant binds even more tightly than the previous best-binding variant, c-Raf-RBD(RK). We measured the binding affinity of c-Raf-RBD(RKY) using a bio-layer interferometry (BLI) assay, and found that this new variant exhibits single-digit nanomolar affinity for KRasGTP, confirming the computational predictions made with fries/EWAK*. This new variant binds roughly five times more tightly than the previous best known binder and roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD). This study steps through the advancement and development of computational protein design by presenting theory, new algorithms, accurate retrospective designs, new prospective designs, and biochemical validation.
Author summary
Computational structure-based protein design is an innovative tool for redesigning proteins to introduce a particular or novel function. One such function is improving the binding of one protein to another, which can increase our understanding of important protein systems. Herein we introduce two novel, provable algorithms, fries and EWAK*, for more efficient computational structure-based protein design as well as their application to the redesign of the c-Raf-RBD:KRas protein-protein interface. These new algorithms speed-up computational structure-based protein design while maintaining accurate calculations, allowing for larger, previously infeasible protein designs. Additionally, using fries and EWAK* within the osprey suite, we designed the tightest known binder of KRas, a heavily studied cancer target that interacts with a number of different proteins. This previously undiscovered variant of a KRas-binding domain, c-Raf-RBD, has potential to serve as a tool to further probe the protein-protein interface of KRas with its effectors and its discovery alone emphasizes the potential for more successful applications of computational structure-based protein design.
Introduction
Computational structure-based protein design (CSPD) is an innovative tool that enables the prediction of protein sequences with desired biochemical properties (such as improved binding affinity). osprey (Open Source Protein Redesign for You) [1] is an open-source, state-of-the-art software package used for CSPD and is available at http://www.cs.duke.edu/donaldlab/osprey.php for free. osprey’s algorithms focus on provably returning the optimal sequences and conformations for a given input model. In contrast, as argued in [2–7], stochastic, non-deterministic approaches [8–10] provide no guarantees on the quality of conformations, or sequences, and make determining sources of error in predicted designs very difficult.
When using osprey, the input model generally consists of a protein structure, a flexibility model (e.g., choice of sidechain or backbone flexibility, allowed mutable residues, etc.), and an all-atom pairwise-decomposable energy function that is used to evaluate conformations. osprey models amino acid sidechains using frequently observed rotational isomers or “rotamers” [11]. Additionally, osprey can also model continuous sidechain flexibility [12–15] along with discrete and continuous backbone flexibility [16–19], which allow for a more accurate approximation of protein behavior [13, 16, 20–23]. The output produced by CSPD generally consists of a set of candidate sequences and conformations. Many protein design methods have focused on computing a global minimum energy conformation (GMEC) [14, 18, 24–28]. However, a protein in solution exists not as a single, low-energy structure, but as a thermodynamic ensemble of conformations. Models that only consider the GMEC may incorrectly predict biophysical properties such as binding [12, 20–23, 29–31] because GMEC-based algorithms underestimate potentially significant entropic contributions. In contrast to GMEC-based approaches, the K* algorithm [12, 29, 30] in osprey provably approximates the Boltzmann-weighted partition function for a protein state, thereby modeling the thermodynamic ensemble. When designing for binding affinity, this enables the designer to calculate the K* score—a ratio of the Boltzmann-weighted partition functions for a protein-ligand complex that estimates the association constant, Ka (further detailed in the Section entitled “Computational materials and methods”). BBK* [32] is an efficient, multi-sequence design algorithm that calls the K* algorithm as a subroutine. Previous algorithms [12, 27, 29, 30, 33–35] that design for binding affinity using ensembles are linear in the size of the sequence space N, where N is exponential in the number of simultaneously mutable residue positions. BBK* is the first provable ensemble-based algorithm to run in time sublinear in N, making it possible not only to perform K* designs over large sequence spaces, but also to enumerate a gap-free list of sequences in order of decreasing K* score.
osprey has been used successfully on several empirical, prospective designs including designing enzymes [12, 16, 22, 29, 36], resistance mutations [2, 37, 38], protein-protein interaction inhibitors [30, 39], epitope-specific antibody probes [40], and broadly-neutralizing antibodies [41, 42]. These successes have been validated experimentally in vitro and in vivo and are now being tested in several clinical trials [43–45]. However, while osprey has been successful in the past, as the size of protein design problems grows (e.g., when considering a large protein-protein interface), enumerating and minimizing the necessary number of conformations and sequences to satisfy the provable halting criteria in previous K*-based algorithms [12, 29, 30] becomes prohibitive (despite recent algorithmic improvements [32]). The entire conformation space can be monumental in size and heavily populated with energetically unfavorable sequences and conformations. EWAK*, an Energy Window Approximation to K*, seeks to alleviate some of this difficulty by restricting the conformations included in each sequence’s thermodynamic ensemble. EWAK*guarantees that each conformational ensemble contains all of the lowest energy conformations within an energy window of the GMEC for each design sequence. fries, a Fast Removal of Inadequately Energied Sequences, also mitigates this complexity problem by limiting the input sequence space to only the most favorable, low energy sequences. Previous algorithms have focused on optimizing for sequences whose conformations are similar in energy to that of the GMEC. In contrast, fries focuses on optimizing for sequences with energies better-than or comparable-to the wild-type sequence. fries guarantees that the restricted input sequence space includes all of the sequences within an energy window of the wild-type sequence, but excludes any potentially unstable sequences with significantly worse partition function values. Wild-type sequences are generally expected to be near-optimal for their corresponding folds [46]. Therefore, limiting the sequence space to sequences energetically similar to or better than the wild-type sequence is reasonable.
We compare BBK* with K* (henceforth referred to as BBK*) to BBK* with EWAK* and fries (henceforth referred to as EWAK* and fries) to test our new methods. The implementation details of these algorithms involve some technical distinctions, which are discussed in S4 Text. Compared to the previous state-of-the-art algorithm BBK*, fries and EWAK* improve runtimes by up to 2 orders of magnitude, fries decreases the size of the sequence space by up to 2 orders of magnitude, and EWAK* decreases the number of conformations included in partition function calculations by up to almost 2 orders of magnitude. These improvements are shown across 2,826 protein design problems spanning 40 different protein systems (see the Section entitled “Computational experiments” for more details).
As a proof of concept to further test these algorithms and our design approach, we used fries and EWAK* to study the protein-protein interface (PPI) of KRasGTP in complex with its tightest-binding effector, c-Raf. As described in the Section entitled “Computational redesign of the c-Raf-RBD:KRas protein-protein interface,” KRas is an important cancer target that has been heavily studied and exhibits a thoroughly optimized protein-protein interface in its interactions with its effectors [47–59]. For this study, we focused on the re-design of the c-Raf Ras-binding domain (c-Raf-RBD), the tightest known naturally-occurring binding partner of KRas, in complex with KRasGTP (c-Raf-RBD:KRasGTP). First, our new algorithms successfully retrospectively predicted the effect on binding of mutations in the c-Raf-RBD:KRasGTP PPI even where other computational methods previously failed [60]. Next, we used fries/EWAK* prospectively to predict the effect of novel, previously unreported mutations in the PPI of the c-Raf-RBD:KRasGTP complex. We then screened the top osprey-predicted c-Raf-RBD variants using a bio-layer interferometry (BLI) assay single-concentration screen. Looking at the dissociation rates, this screen suggested that one of our new computationally-predicted c-Raf-RBD variants—c-Raf-RBD(Y), a c-Raf-RBD that includes the mutation V88Y—exhibits improved binding to KRasGTP. Next, we created a c-Raf-RBD variant, c-Raf-RBD(RKY), that included this new mutation, V88Y, together with two previously reported mutations [60], N71R and A85K. fries/EWAK* computationally predicted that c-Raf-RBD(RKY) would bind more tightly to KRasGTP than any other variant. The single-concentration screen using BLI also suggested that c-Raf-RBD(RKY) binds more tightly to KRasGTP than the previously reported best variant [60]. The Kd values for the most promising variants were measured using a BLI assay with titration which confirmed our computational predictions and that, to the best of our knowledge, the novel construct c-Raf-RBD(RKY) is the highest affinity variant ever designed, with single-digit nanomolar affinity for KRasGTP and binding roughly 36 times more tightly than the design starting point (wild-type c-Raf-RBD).
Computational materials and methods
The K* algorithm’s [12, 29, 30] K* score serves as an estimate of the binding constant, Ka, and is calculated by first approximating the Boltzmann-weighted partition function of each state: unbound protein (P), unbound ligand (L), and the bound protein-ligand complex (C). Each Boltzmann-weighted partition function Zx(s), x ∈ {P, L, C}, is defined as:
(1) |
If s is any—generally amino acid—sequence of n residues, then Q(s) is the set of conformations defined by s, Ex(d) is the minimized energy of a conformation d in state x, and R and T are the gas constant and temperature, respectively. Many protein design algorithms approximate these partition functions for each state using either stochastic [61–64] or provable [2, 12, 29–31, 33, 64] methods.
osprey’s K* algorithm approximates these partition functions to within a user-specified ε of the full partition function as defined in Eq (1) where C, P, and L refer to the protein-ligand complex, the unbound protein, and the unbound ligand, respectively. The binding affinity for sequence s is defined as:
(2) |
The K* algorithm provably approximates this binding affinity. This is enabled by the use of A* [4, 12, 26, 65], which allows for the gap-free enumeration of conformations in order of increasing lower bounds on energy [26]. However, enumerating a sufficient number of these conformations to obtain a guaranteed ε-approximation can be very time consuming because the set of all conformations Q(s) grows exponentially with the number of residues n. Also, the K* algorithm was originally [12, 29, 30] limited to computing a K* score for every sequence in the sequence space as defined by the input model for a particular design. However, BBK* [32] builds on K* and returns the top m sequences along with their ε-approximate K* scores and runs in time sublinear in the number of sequences. That is, BBK* does not require calculating ε-approximate K* scores for (or even examining) every sequence in the sequence space before it returns the top sequences. Nevertheless, BBK* may spend unnecessary time and resources evaluating unfavorable sequences before deciding to prune them. These previous methods, while efficient, suffer from two practical drawbacks. First, some returned sequences exhibit a large K* score (i.e. are predicted to improve binding) due to a decrease in stability of the unbound states. These sequences are rarely desirable in practice, since decreasing protein stability can result in poor folding and aggregation. Second, the approximation error for some sequences is slow to approach epsilon which can lead to prohibitively slow designs.
To overcome the above limitations of BBK* and K*, we introduce fries, a Fast Removal of Inadequately Energied Sequences, and EWAK*, an Energy Window Approximation to K*. These two algorithms limit the input sequence space and the number of conformations included in each partition function estimate when approximating a sequence’s K* score to only the most energetically favorable options (see Fig 1). The fries/EWAK* approach limits the number of conformations that must be enumerated (see the Section entitled “fries limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores”), which leads to significant speed-ups (see the Section entitled “fries/EWAK* is up to 2 orders of magnitude faster than BBK*”) because each enumerated conformation must undergo an energy minimization step. This minimization step is relatively expensive, therefore, anything that reduces the number of minimized conformations while not sacrificing provable accuracy is desirable. For the importance of this minimization step to biological accuracy, see the discussions of continuous flexibility and its comparison to discrete flexibility in [4, 5, 7, 13, 14, 19]. EWAK* also maintains the advances made by BBK* including running in time sublinear in the number of sequences N and returning sequences in order of decreasing K* score. fries and EWAK* are described in further detail in the Section entitled “Algorithms” below.
Algorithms
Fast removal of inadequately energied sequences (fries)
Generally in protein design when optimizing a protein-protein interface (PPI) for affinity, the designer aims to improve the K* score of a variant sequence relative to the wild-type sequence, and, when performing a design targeting a similar fold, to minimally perturb the native structure. To accomplish this, fries guarantees to only keep sequences whose partition function values are not markedly worse than the wild-type sequence’s partition function values for all of the design states (e.g. protein, ligand, and complex). How many orders of magnitude worse a particular sequence’s partition function values are allowed to be is determined by a user-specified value m. The fries algorithm prunes sequences that exhibit massive decreases in partition function values that signal an increased risk of disturbing the native structure of the states in a given system. However, sequences with markedly worse, lower partition function values may be required when searching for, for example, resistance mutations, where positive and negative design are necessary [2, 37, 38]. Importantly, fries does still allow for sequences that may have lower, worse partition function values by allowing the user to specify how many orders of magnitude lower a candidate sequence’s partition function is allowed to be relative to the wild-type sequence’s partition function.
The following algorithm is applied to each of the three states (protein, ligand, and protein-ligand complex) independently. The resulting, filtered sequence space is determined by taking the intersection of the output from the algorithm for the three states. To prune the input sequence space, fries exploits A* over a multi-sequence tree (as is described and used in comets [66]), which enjoys a fast sequence enumeration in order of lower bound on minimized energy. Each sequence v in this multi-sequence tree [66] has a corresponding single-sequence conformation tree, viz., a tree that can be searched for the lowest energy conformations for a sequence v. fries first enumerates sequences (in order of energy lower bounds) in the multi-sequence tree until the wild-type sequence is found. Then, fries searches the wild-type’s corresponding single-sequence conformation tree using A*. The first conformation enumerated according to monotonic lower bound on pairwise minimized energy is then subjected to a full-atom minimization [30] to calculate the minimized energy of one of the wild-type sequence’s conformations EWT. It is worth noting that fries only descends into and searches the single-sequence conformation tree for the wild-type sequence in order to calculate the provable halting criteria for Eq 3. fries then continues enumerating sequences in the multi-sequence tree in order of increasing lower bound on minimized energy (as described in more detail in [66]) until the lower bound on the optimal conformational energy for a sequence v, , is greater than EWT + w where EWT is as described above and w is a user-specified energy window value (see Fig 2). Any variant sequence v with a lower bound on minimized energy not satisfying the following criterion is pruned:
(3) |
This criterion guarantees that the remaining, unpruned sequence space includes all sequences within an energy window of the wild-type sequence’s energy. fries enumerates sequences in order of increasing lower bound on minimized energy. Therefore, it calculates an upper bound on the partition function for each sequence v by Boltzmann-weighting the lower bound on its energy and multiplying it by the size of the conformation space for that particular sequence |Q(v)|:
(4) |
The lower bound for the wild-type sequence is calculated by Boltzmann-weighting the minimized energy of the single conformation found during the sequence search for the wild-type sequence EWT:
(5) |
is a lower bound because, in the worst case, at least this one conformation will contribute to the partition function for the wild-type sequence. fries then uses these bounds to remove all of the sequences whose partition function value is not within some user-specified m orders of magnitude of the lower bound on the wild-type partition function . If the following criterion is not met, the sequence v is pruned from the space:
(6) |
fries prunes sequences for the protein, the ligand, and the protein-ligand complex independently, limiting the input sequence space to exclude unfavorable sequences for all of the states. The resulting smaller sequence space is subsequently used as input for EWAK*. The set of sequences remaining is guaranteed to include all of the sequences within a user-specified energy window w of the wild-type sequence that also satisfy the partition function criterion given in Eq (4). Importantly, fries can be used to limit the size of the input sequence space in this fashion for any of the protein design algorithms available within osprey.
Energy window approximation to K* (EWAK*)
After reducing the size of the input sequence space using fries, as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries),” EWAK* proceeds by using a variation on an existing algorithm: BBK* (described in [32]). The crucial difference between BBK* and EWAK* is that with EWAK* the ensemble of conformations used to approximate each K* score is limited to those within a user-specified energy window of the GMEC for each sequence. This guarantees to populate the partition function for a particular sequence and state with all of the lowest, most-favorable conformations (that fall within the user-specified energy window). Limiting the partition functions to only these energetically favorable conformations can effectively reduce runtime while still enriching for desirable sequences. These conformations often account for the majority of the full ε-approximate partition function (see the Section entitled “Computational materials and methods”) in traditional K* calculations [12]. Hence, EWAK* also empirically enjoys negligible loss in accuracy of K* scores (see the Sections labeled “EWAK* limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores” and “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas”). EWAK* retains the beneficial aspects of BBK*, including returning sequences in order of decreasing predicted binding affinity and running in time sublinear in the number of sequences. For a discussion of the relationship between ε and the energy window w, the interested reader is invited to refer to the SI.
Computational experiments
We implemented fries/EWAK* in the osprey suite of open source protein design algorithms [1]. fries was tested on 2,662 designs that range from an input sequence space size of 441 to 10,164 total sequences. The size of the reduced input sequence space produced by fries was compared to the size of the full input sequence space size for each design. For these tests, fries returned every sequence within 8 kcal/mol of the wild-type sequence and was set to include only those sequences that are at most 2 orders of magnitude worse in partition function value than the wild-type. The results for these tests are described in the Section entitled “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences.” Computational experiments were also run comparing fries/EWAK* with the previous state-of-the-art algorithm in osprey: BBK* [32]. Using BBK* and fries/EWAK*, we computed the top 5 best binding sequences for 167 different designs to compare the running time of BBK* vs. fries/EWAK*. fries was limited to sequences within 4 kcal/mol of the wild-type sequence that are at most 2 orders of magnitude worse in partition function values than the wild-type. The EWAK* partition function approximations were limited to conformations within an energy window of 1 kcal/mol of the GMEC for each sequence. BBK* was set to return the top 5 sequences with an accuracy of ε = 0.68 (as was described in [32]). Using these same EWAK* and BBK* parameters, we also compared the change in the size of the conformation space necessary to compute an accurate K* score for BBK* vs. EWAK* for 661 partition functions from 161 design examples. The results for these tests are described in the Sections labeled “fries/EWAK* is up to 2 orders of magnitude faster than BBK*” and “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences”. The number of conformations that undergo minimization (as described in [12–15]) for each partition function calculation with EWAK* was also compared across different energy window sizes for 350 partition function calculations from 87 design examples. These partition function calculations were compared to BBK*’s partition function calculations with a demanded accuracy of ε = 0.10. This smaller ε allowed for more accurate approximations of the K* scores. The results for these tests are described in the Section entitled “fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences”.
Every design included a set of mutable residues along with a set of surrounding flexible residues (see Fig 1 for an example). All of these residues were allowed to be continuously flexible [12–15]. The designs were selected from 40 different protein structures (listed in S1 Table and also used in [32, 68]), and were run on 40-48 core Intel Xeon nodes with up to 200 GB of memory.
Computational results
fries can reduce the size of the input sequence space by more than 2 orders of magnitude while retaining the most favorable sequences
The number of remaining sequences after fries was compared to the size of the complete input sequence space. In the best case, when using fries, the sequence space was decreased by more than 2 orders of magnitude and the conformation space was decreased by just over 4 orders of magnitude. The sequence space was reduced an average of 49% and the conformation space was reduced an average of 40%. These results are broken down further in Fig 3.
fries/EWAK* is up to 2 orders of magnitude faster than BBK*
The overall runtime was compared between BBK* and fries/EWAK*. fries/EWAK* was an average of 62% faster than BBK* on 167 example design problems. fries removed unfavorable sequences (as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”) from the search space for 156 out of the 167 design problems. For the cases described in the Section entitled “Computational experiments,” fries/EWAK* performed consistently faster than BBK* (in 92% of the design examples) as shown in Fig 4, Panel A. The longest running BBK* design problem took nearly 8 days, whereas fries/EWAK* completed the same example in just under 2 hours. In contrast, the design problem that took the longest for fries/EWAK* out of the 167 tested only required about 22 hours (the same design took BBK* just over 178 hours).
EWAK* limits the number of minimized conformations when approximating partition functions while maintaining accurate K* scores
We examined 661 K* score calculations, and concluded that the total number of conformations minimized to approximate the K* score was decreased by an average of 27%. In the best case the number of conformations minimized to approximate the K* score was decreased by 93%. These results are plotted in Fig 4, Panel B. Even though the partition function approximations were limited to a smaller conformation space with EWAK*, the K* scores did not differ by more than 0.2 orders of magnitude between EWAK* and BBK* for these 661 example K* score calculations.
A total of 350 of these 661 partition functions were subsequently re-estimated using BBK* with a more accurate, stringent ε value of 0.1 and using EWAK* with varied energy windows: 1.0 kcal/mol, 3.0 kcal/mol, and 5.0 kcal/mol. We examined the number of conformations minimized for each complex partition function calculation across the examples. When using 1.0 kcal/mol, EWAK* minimized up to 1.7 orders of magnitude fewer conformations (see Fig 4, Panel C for more details). Despite this decrease in the number of included conformations, EWAK* reported accurate K* scores. The largest difference in scores between BBK* and EWAK* was 0.3 orders of magnitude. This indicates that EWAK* retains accuracy when compared to previous provable algorithms, which have been extensively validated using experimental measurements of binding, crystal structures, and NMR structures on a variety of systems [22, 30, 36–38, 40–42]. The accuracy of EWAK* is explored further in the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas,” where we perform additional retrospective validation against experimental measurements.
Computational redesign of the c-Raf-RBD:KRas protein-protein interface
We previously showed, investigating 58 mutations across 4 protein systems, that osprey can accurately predict the effect of mutations on PPI binding [1]. Herein, we tested the biological accuracy of the new modules fries and EWAK* after adding them to osprey in the case of a particular system: c-Raf-RBD in complex with KRas. The c-Raf Ras-binding domain (c-Raf-RBD) is a small self-folding domain that does not include the kinase signaling domains normally present in c-Raf. The c-Raf-RBD normally binds to KRas when KRas is GTP-bound (KRasGTP). KRas has been implicated in difficult-to-treat cancers such as pancreatic ductal adenocarcinoma (PDAC) and has therefore been thoroughly studied [47, 47, 48, 48, 49, 49–55, 55, 56, 56–60, 69, 70]. So, to further verify the accuracy and utility of fries/EWAK*, we focused on this already heavily optimized PPI between KRasGTP and one of its many effectors, c-Raf-RBD. First, in the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas,” we retrospectively investigated previously reported mutations in the c-Raf-RBD [48, 49, 60] and how they effect the binding of c-Raf-RBD to KRas. This retrospective study lays the groundwork for the prospective study we present that investigates novel mutations. So, following the retrospective study, we computationally redesigned the PPI using fries/EWAK* in search of new c-Raf-RBD variants with improved affinity for KRasGTP (see the Section entitled “Prospective redesign of the c-Raf-RBD:KRas protein-protein interface toward improved binding” for details). To perform these computational designs, we first made a homology model of c-Raf-RBD bound to KRasGTP (see S1 Text for details).
fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas
Each previously reported c-Raf-RBD variant [48, 49, 60] was tested computationally using fries/EWAK* by calculating a K* score, a computational approximation of Ka, for each variant along with its corresponding wild-type sequence. A percent change in binding was then calculated by comparing the variant’s K* score to the corresponding wild-type sequence’s K* score. The log10 of this value was then calculated and normalized to the wild-type by subtracting 2. A similar procedure was completed using the reported experimental data in order to easily compare the computationally predicted effect with the experimentally measured effect. The resulting value, called Δb, represents the change in binding. If a variant has a Δb less than 0, it is predicted to decrease binding. If a variant has a Δb greater than 0, it is predicted to increase binding. Δb values that are roughly equivalent to 0 indicate variants that have little to no effect on binding since the wild-type sequence was normalized to 0. The Δb values for the 41 computationally tested variants were plotted and compared to experimental values in Fig 5 (a table of these values is also presented in S2 Table).
Out of the 41 variants tested (see S2 Table), EWAK* predicted the experimentally-reported effect (increased vs. decreased binding) correctly in 38 cases. The three designs where the effect was predicted incorrectly are marked with a star in Fig 5. To make these predictions, the corresponding computational designs ranged in size from single point mutations up to 6 simultaneous mutations. Results are outlined in Fig 5 and data is presented in S2 Table. The Pearson’s r of the Δb values when comparing the experimental data to the computational predictions is 0.64. Furthermore, the Spearman’s ρ value—a measure of the correlation between two sets of rankings—when comparing the experimental data to the computational predictions is 0.81. This ρ value indicates that not only can EWAK* correctly predict the effect of a particular set of mutations, but that EWAK* also does a good job ranking the variants in order according to change in binding upon mutation (see Fig 6). We emphasize Spearman’s ρ here as opposed to a Pearson’s correlation since our current designs likely underestimate entropic contributions to binding due to solvent entropy, backbone entropy, and rotating methyl groups. Nevertheless, by explicitly modeling side-chain configurational entropy, our method considers more conformational entropy than GMEC-based methods—in [1, 38] large changes in K* score corresponded to significant changes in energy, and rankings correlated well with experimental binding measurements. The Spearman’s ρ for the study presented here is comparable to the values for other PPI systems when using osprey [1, 38]. Furthermore, an accurate ranking can guide an experimental lab in choosing the rank order in which to test computational predictions [2, 12, 16, 22, 29, 30, 36–42].
BBK* produced similarly accurate results, but took up to 10 times longer and failed to produce results in 4 cases. In particular, in 2 cases (marked in green in Fig 5), BBK* ran out of memory. These cases in particular serve as examples of large designs where EWAK* outperforms BBK* and highlight the utility of fries/EWAK* when considering larger designs. In the 2 other cases (marked in orange in Fig 5), BBK* failed to return a result for the requested sequence in the top 5 reported sequences. This illustrated how EWAK* and fries are particularly helpful when performing these types of bigger designs that contain more simultaneous mutations and more flexible residues.
Finally, we compared our predictions to the interesting biological predictions in [60]. It is unclear how many mutants were computationally evaluated, but the authors do report computational predictions for 6 point mutations. Of those, point mutants R67L, N71R, and V88I were predicted to improve the intermolecular interactions between c-Raf-RBD and KRasGTP. However, experiments found that R67L and V88I actually reduced the binding of c-Raf-RBD to KRasGTP [48, 60]. In contrast to [60], EWAK* accurately predicted that these mutations decrease binding of c-Raf-RBD to KRasGTP. For a more detailed view of one of these designs, V88I, see Fig 7. Additionally, a number of mutations were combined and experimentally tested in [60]. Unfortunately, none of these variants improved binding to either KRasGTP or KRasGDP, which fries/EWAK* correctly predicted computationally (see Fig 5). In [60], the authors do not present any computational predictions for these combined variants, but our results show that a computational prediction using osprey’s EWAK* would have saved the time and resources taken to experimentally test these variants.
Prospective redesign of the c-Raf-RBD:KRas protein-protein interface toward improved binding
The ability to accurately predict the effect mutations have on the binding of c-Raf-RBD to KRasGTP (see the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas”) gave us confidence in the EWAK* algorithm’s ability to predict new mutations in this interface toward a c-Raf-RBD variant that exhibits an even higher affinity for KRasGTP than previously reported variants which focused on targeting KRasGDP [60]. Therefore, to do a prospective study, we computationally redesigned 14 positions in c-Raf-RBD in the c-Raf-RBD:KRas PPI to identify promising mutations. After extending osprey to include fries and EWAK*, 14 different designs were completed where each design included 1 mutable position that was allowed to mutate to all amino acid types except for proline. Each design also included a set of surrounding flexible residues within roughly 4 Å of the mutable residue. These designs were run using fries and EWAK* and included continuous flexibility [12–15]. fries was first used to limit each design to only the most favorable sequences (as described in the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”) and then EWAK* was used to estimate the K* scores (as described in the Section entitled “Energy Window Approximation to K* (EWAK*)”). We report the upper and lower bounds on the EWAK* score for each design in Table 1 and S3 Table, where the listed sequences are those that were not pruned during the fries step. From these results, the predicted binding effect (increased vs. decreased) was determined based on comparing each variant’s K* score to its corresponding wild-type K* score. We then selected 5 novel point mutations—that to our knowledge are not reported in any existing literature—for experimental validation (see Table 1). It is worth noting that these 5 point mutations were selected out of an initial 294 possible mutations. We limited our experimental validation to only these 5 new mutations and 2 previously reported mutations. This greatly reduced the amount of resources necessary for experimental validation compared to testing all 294 possibilities. Of the mutations selected, T57M was selected to act as a variant that we computationally predicted to be comparable to wild-type. This variant was included to further verify the accuracy of osprey’s predictions. On the other hand, some of osprey’s top predictions were excluded, for instance, T57R (included in S3 Table) was not selected for experimental testing because it has an unsatisfied hydrogen bond as evidenced in the structures calculated by osprey. Another example is position V69 where 3 different mutations are predicted to improve binding, however, this position was included in our retrospective study (see the Section entitled “fries/EWAK* retrospectively predicted the effect mutations in c-Raf-RBD have on binding to KRas” and Fig 5) and was 1 of only 3 positions where osprey incorrectly predicted the effect of the mutation. Therefore, we do not believe that the scores accurately represent the effect the mutations will have in these few cases. Other excluded top predictions (see S3 Table) displayed similar characteristics or have been reported and tested previously [48, 49, 60]. One special case that is not shown in our experimental validation below is V88W which caused poor expression of c-Raf-RBD so we were unable to test it.
Table 1. Computational predictions by osprey/fries/EWAK* that were selected for experimental validation.
Mutation | Lower Bound log(K*) | Upper Bound log(K*) |
---|---|---|
T57M | 3.43 | 3.46 |
T57 | 3.82 | 3.92 |
T57K | 5.01 | 5.07 |
N71 | 7.25 | 7.49 |
N71R | 9.66 | 10.10 |
A85 | 26.3 | 26.9 |
A85K | 30.7 | 32.3 |
K87 | 13.4 | 14.1 |
K87Y | 14.1 | 14.2 |
V88 | 16.5 | 16.6 |
V88Y | 17.3 | 17.6 |
V88F | 18.0 | 18.2 |
Experimental validation of mutations in the c-Raf-RBD:KRas protein-protein interface
The mutations selected (highlighted in Table 1) from computational design were first screened using a bio-layer interferometry (BLI) single concentration assay (see the Section entitled “Bio-layer interferometry (BLI) dissociation rate and response screening” below). For this assay, we plotted response vs. dissociation rate constant (see Fig 9). This allowed us to quickly obtain a qualitative probe of c-Raf-RBD variant binding to KRas. It has been shown that off-rate measurements correlate to overall binding affinity [72–74]. A potential pitfall of depending only on off-rate observations is the potential for a slow off-rate to be paired with a slow on-rate, resulting in lower than expected affinity. Results from this initial single-concentration BLI screen (see Fig 8) suggested that, contrary to the computational predictions, the T57K and V88F variants decrease binding, whereas the T57M and K87Y mutations both have a roughly neutral effect on binding, which is consistent with the computational predictions. The final computationally predicted point mutant, V88Y, improves binding a comparable amount to the improvement seen with A85K or N71R, two previously reported variants that improve binding as correctly predicted by osprey and also experimentally tested herein. With the discovery of this new variant containing the point mutant V88Y (referred to herein as c-Raf-RBD(Y)) the next natural step was to combine it with the mutations found in the best reported variant, N71R and A85K (referred to herein as c-Raf-RBD(RK)). Therefore, we also included the double-mutant, c-Raf-RBD(RK), and the new triple-mutant—which contains N71R, A85K, and V88Y and is referred to herein as c-Raf-RBD(RKY)—in our initial BLI screen. The c-Raf-RBD(RKY) variant was computationally predicted by fries/EWAK* to bind to KRasGTP more tightly than the previous best known binder, c-Raf-RBD(RK) (results are detailed in Fig 9). Given the promising screening and computational predictions for the c-Raf-RBD(Y) and c-Raf-RBD(RKY) variants, we measured Kd values for each variant by titrating the analyte over the ligand in a full titration BLI-based assay (see Fig 10 and the Section entitled “Bio-layer interferometry (BLI) dissociation rate and response screening” below). Titration experiments showed strong qualitative agreement with our single concentration screen. Excitingly, c-Raf-RBD(RKY) is calculated by the data from the full titration BLI assay (see Fig 10) to bind KRasGTP roughly 5 times better than the previous best known binder, c-Raf-RBD(RK), and approximately 36 times better than wild-type c-Raf-RBD, the design starting point. Given how heavily studied the KRas system is, with many reported mutational and structural studies [47, 47, 48, 48, 49, 49–55, 55, 56, 56–60, 69, 70], this is a surprising discovery.
Experimental materials and methods
Each variant of c-Raf-RBD was expressed and purified (see S2 Text) with cysteine residues at positions 81 and 96 substituted for isoleucine and methionine, respectively. These mutations were previously reported to have a minimal affect on the stability of c-Raf-RBD [55] and their substitution allows for the use of the c-Raf-RBD constructs in other assays (not mentioned herein). Additionally, we do not believe these residue substitutions have a large effect since the Kd values determined herein align with previously reported Kd values [60] (see Fig 10). KRas was expressed and purified (see S3 Text) with a poly-histidine protein tag (His-tag) and loaded with a non-hydrolyzable GTP analog, GppNHp. KRas was also made to include a substitution at position 118 from cysteine to serine in order to increase expression and stability [75].
Bio-layer interferometry (BLI) dissociation rate and response screening
His-tagged KRasGppNHp was immobilized in nickel-nitrilotriacetic acid (Ni-NTA) biosensors tips and dipped into a single concentration of 250 nM for each c-Raf-RBD variant using an Octet Red96 instrument (FortéBio). All samples were previously diluted in kinetics buffer (PBS [pH 7.2], 0.01% [w/v] BSA, 0.002% [v/v] Tween 20) supplemented with 200 mM NaCl, 5 mM MgCl2 and 1 mM TCEP. After steady state was achieved for all samples, samples were allowed to dissociate in kinetics buffer (PBS [pH 7.2], 0.01% [w/v] BSA, 0.002% [v/v] Tween 20) supplemented with 200 mM NaCl 5mM MgCl2 and 1mM TCEP. A buffer blank and binding of c-Raf-RBD variants to Ni-NTA tips in the absence of KRasGppNHp were used as references for double subtraction. Curves (see Fig 8) were aligned on the y-axis to the average baseline, and an inter-step correction was aligned to the dissociation step. A dissociation only 1:1 binding model was used to fit the dissociation rate for a window of 120 s.
Bio-layer interferometry (BLI) titration assay
Binding of wild-type and variants of c-Raf-RBD were experimentally measured using a bio-layer interferometry (BLI) titration assay. Ni-NTA tips were then used to perform the BLI experiments to determine binding of the c-Raf-RBD variants to KRasGppNHp (results along with replicates are shown in Figs 8 and 10, S4 Table, S2 and S3 Figures). All experiments were carried out in 30 mM phosphate pH 7.4, 327 mM NaCl, 2.7 mM KCl, 5 mM MgCl2, 1.5 mM TCEP, 0.1% BSA, and 0.02% Tween-20 + Kathon at 25°C with 1000 RPM shaking and a KRas loading concentration of 20 μg/ml. Each curve presented (see Fig 10) was fit using the built-in mass transport model within the Octet Data Analysis HT software provided by FortéBio. We only accepted fits with a sum of square deviations χ2 less than 1 (FortéBio recommends a value less than 3) and a coefficient of determination R2 greater than 0.98.
Discussion
fries and EWAK* are new, provable algorithms for more efficient ensemble-based computational protein design. Efficiency and efficacy were tested and shown across a total of 2,826 different design problems. An implementation of fries/EWAK* is available in the open-source protein design software OSPREY [1] and all of the data has been made available (see Data Availability Statement). fries/EWAK* in combination achieved a significant runtime improvement over the previous state-of-the-art, BBK*, with runtimes up to 2 orders of magnitude faster. EWAK* also limits the number of minimized conformations used in each K* score approximation by up to about 2 orders of magnitude while maintaining provable guarantees (see the Section entitled “Energy Window Approximation to K* (EWAK*)”). fries alone is capable of reducing the input sequence space while provably keeping all of the most energetically favorable sequences (see the Section entitled “Fast Removal of Inadequately Energied Sequences (fries)”), decreasing the size of the sequence space by more than 2 orders of magnitude, and leading to more efficient design given the smaller search space.
To further validate osprey with fries/EWAK*, we applied these algorithms to a well-studied and biomedically interesting system: the c-Raf-RBD:KRas PPI. First, we performed a series of retrospective designs where fries/EWAK* accurately predicted how a variety of mutations affect the binding of c-Raf-RBD to KRasGTP that previous computational methods had failed to accurately predict [60]. This success supports the use of osprey and fries/EWAK* to evaluate the affect mutations in the protein-protein interface of c-Raf-RBD:KRas have on binding (more, similar successes of the K* algorithm are presented and discussed in [1]). fries/EWAK* also prospectively predicted the effect of new mutations in the c-Raf-RBD:KRas PPI and discovered a novel c-Raf-RBD mutation V88Y with improved affinity for KRas. We went on to combine this new mutation with two previously reported mutations, N71R and A85K [60], to create c-Raf-RBD(RKY), an even stronger binding c-Raf-RBD variant, which fries/EWAK* accurately predicted. We biochemically screened top predicted variants using an initial bio-layer interferometry (BLI) single-concentration assay. Only a promising subset of the computationally predicted and initially screened variants were then evaluated using a BLI titration assay to calculate Kd values for individual c-Raf-RBD variants. We determined that c-Raf-RBD(RKY) binds to KRasGTP roughly 36 times more tightly than wild-type c-Raf-RBD, making it the tightest known c-Raf-RBD variant binding partner of KRasGTP.
Given that numerous groups have explored this protein-protein interaction [47–59] and performed mutagenesis on c-Raf-RBD either, through rational means [47, 48, 56, 69], computational methods [49, 60] or high-throughput evolutionary methods [55, 70] and that none identified V88Y, this discovery validates our computational approach and the use of computational algorithms such as fries and EWAK* to re-design protein-protein interfaces toward improved binding. Additionally, previous mutations that enhanced the affinity of c-Raf-RBD to KRas did so by supercharging c-Raf-RBD [48, 49, 60]. In contrast, our mutation V88Y introduces a novel, aromatic residue. The discovery that such a mutation can improve the binding of c-Raf-RBD to KRasGTP suggests that previous work has not completely explored the sequence space available to this binding interaction. These new c-Raf-RBD variants could be fused to cell-penetrating peptides and used as in-cell tools to further characterize KRas:effector signalling.
Supporting information
Acknowledgments
We thank Rachel Kimbrough, Chanelle Simmons, Catherine Ehrhart, and Kelly Huynh for assistance with experiments and biochemical validation, Ben Fenton and Michael Kennedy for the initial KRas plasmids, and Paul Modrich for the Rosetta 2(DE3) cell lines. We also thank Terrence Oas and all members of the lab for helpful discussions.
Data Availability
All of the computational experiments and code used and discussed in this manuscript are available from the Harvard Dataverse repository (https://doi.org/10.7910/DVN/VHIRNM). For new empirical designs, we recommend using the latest version of OSPREY available for free at http://www.cs.duke.edu/donaldlab/osprey.php. All computer code for the OSPREY system is also available on GitHub at https://github.com/donaldlab/OSPREY3, and is open-source and free.
Funding Statement
This work is primarily support by the following grants from the National Institutes of Health (NIH): R01-GM078031 and R01-GM118543 to BRD. Website: www.nih.gov. In addition, AUL was partially supported by a PhRMA Foundation Pre Doctoral Fellowship in Informatics. Website: www.phrmafoundation.org. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Hallen MA, Martin JW, Ojewole A, Jou JD, Lowegard AU, Frenkel MS, et al. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. Journal of Computational Chemistry. 2018;39(30):2494–2507. 10.1002/jcc.25522 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Ojewole A, Lowegard A, Gainza P, Reeve SM, Georgiev I, Anderson AC, et al. OSPREY predicts resistance mutations using positive and negative computational protein design In: Computational Protein Design. Springer; 2017. p. 291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, et al. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol. 2013;523:87–107. 10.1016/B978-0-12-394292-0.00005-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Donald BR. Algorithms in Structural Molecular Biology. Cambridge, MA: MIT Press; 2011. [Google Scholar]
- 5. Gainza P, Nisonoff HM, Donald BR. Algorithms for protein design. Current Opinion in Structural Biology. 2016;39:16–26. 10.1016/j.sbi.2016.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T. Guaranteed Discrete Energy Optimization on Large Protein Design Problems. J Chem Theory Comput. 2015;11(12):5980–9. [DOI] [PubMed] [Google Scholar]
- 7.Hallen MA, Donald BR. Protein Design by Algorithm. arXiv preprint arXiv:180606064. 2018.
- 8. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A. 2000;97(19):10383–8. 10.1073/pnas.97.19.10383 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–74. 10.1016/B978-0-12-381270-4.00019-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Lee C, Subbiah S. Prediction of protein side-chain conformation by packing optimization. Journal of Molecular Biology. 1991;217(2):373–388. [DOI] [PubMed] [Google Scholar]
- 11. Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins. 2000;40(3):389–408. [PubMed] [Google Scholar]
- 12. Georgiev I, Lilien RH, Donald BR. The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem. 2008;29(10):1527–42. 10.1002/jcc.20909 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Gainza P, Roberts KE, Donald BR. Protein design using continuous rotamers. PLoS Comput Biol. 2012;8(1):e1002335 10.1371/journal.pcbi.1002335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Hallen MA, Gainza P, Donald BR. Compact Representation of Continuous Energy Surfaces for More Efficient Protein Design. J Chem Theory Comput. 2015;11(5):2292–306. 10.1021/ct501031m [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hallen MA, Jou JD, Donald BR. LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid-rotamer-like Efficiency. Research in Computational Molecular Biology (RECOMB). 2016;9649:122–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Georgiev I, Donald BR. Dead-end elimination with backbone flexibility. Bioinformatics. 2007;23(13):i185–94. [DOI] [PubMed] [Google Scholar]
- 17. Georgiev I, Keedy D, Richardson JS, Richardson DC, Donald BR. Algorithm for backrub motions in protein design. Bioinformatics. 2008;24(13):i196–204. 10.1093/bioinformatics/btn169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hallen MA, Donald BR. CATS (Coordinates of Atoms by Taylor Series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics. 2017;33(14):i5–i12. 10.1093/bioinformatics/btx277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins. 2013;81(1):18–39. 10.1002/prot.24150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Tzeng SR, Kalodimos CG. Protein activity regulation by conformational entropy. Nature. 2012;488(7410):236 [DOI] [PubMed] [Google Scholar]
- 21. Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys J. 1997;72(3):1047–69. 10.1016/S0006-3495(97)78756-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Chen CY, Georgiev I, Anderson AC, Donald BR. Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci USA. 2009;106(10):3764–9. 10.1073/pnas.0900266106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Sciretti D, Bruscolini P, Pelizzola A, Pretti M, Jaramillo A. Computational protein design with side-chain conformational entropy. Proteins. 2009;74(1):176–91. [DOI] [PubMed] [Google Scholar]
- 24. Georgiev I, Lilien RH, Donald BR. Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design. Bioinformatics. 2006;22(14):e174–83. [DOI] [PubMed] [Google Scholar]
- 25. Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997;278(5335):82–7. [DOI] [PubMed] [Google Scholar]
- 26. Leach AR, Lemon AP. Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins. 1998;33(2):227–39. [DOI] [PubMed] [Google Scholar]
- 27. Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, et al. A new framework for computational protein design through cost function network optimization. Bioinformatics. 2013;29(17):2129–36. [DOI] [PubMed] [Google Scholar]
- 28. Chazelle B, Kingsford C, Singh M. A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies. INFORMS Journal on Computing. 2004;16(4):380–392. [Google Scholar]
- 29. Lilien RH, Stevens BW, Anderson AC, Donald BR. A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J Comput Biol. 2005;12(6):740–61. [DOI] [PubMed] [Google Scholar]
- 30. Roberts KE, Cushing PR, Boisguerin P, Madden DR, Donald BR. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput Biol. 2012;8(4):e1002477 10.1371/journal.pcbi.1002477 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Silver NW, King BM, Nalam MNL, Cao H, Ali A, Kiran Kumar Reddy GS, et al. Efficient Computation of Small-Molecule Configurational Binding Entropy and Free Energy Changes by Ensemble Enumeration. J Chem Theory Comput. 2013;9(11):5098–5115. 10.1021/ct400383v [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Ojewole AA, Jou JD, Fowler VG, Donald BR. BBK* (Branch and Bound over K*): A Provable and Efficient Ensemble-Based Algorithm to Optimize Stability and Binding Affinity over Large Sequence Spaces. Research in Computational Molecular Biology (RECOMB). 2017; p. 157–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Viricel C, Simoncini D, Barbe S, Schiex T. Guaranteed weighted counting for affinity computation: Beyond determinism and structure. In: International Conference on Principles and Practice of Constraint Programming. Springer; 2016. p. 733–750.
- 34. Traoré S, Allouche D, André I, Schiex T, Barbe S. Deterministic Search Methods for Computational Protein Design. Methods Mol Biol. 2017;1529:107–123. [DOI] [PubMed] [Google Scholar]
- 35. Traoré S, Roberts KE, Allouche D, Donald BR, André I, Schiex T, et al. Fast search algorithms for computational protein design. J Comput Chem. 2016;37(12):1048–58. 10.1002/jcc.24290 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Stevens BW, Lilien RH, Georgiev I, Donald BR, Anderson AC. Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry. 2006;45(51):15495–504. [DOI] [PubMed] [Google Scholar]
- 37. Frey KM, Georgiev I, Donald BR, Anderson AC. Predicting resistance mutations using protein design algorithms. Proc Natl Acad Sci U S A. 2010;107(31):13707–12. 10.1073/pnas.1002162107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Reeve SM, Gainza P, Frey KM, Georgiev I, Donald BR, Anderson AC. Protein design algorithms predict viable resistance to an experimental antifolate. Proc Natl Acad Sci U S A. 2015;112(3):749–54. 10.1073/pnas.1411548112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Gorczynski MJ, Grembecka J, Zhou Y, Kong Y, Roudaia L, Douvas MG, et al. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem Biol. 2007;14(10):1186–97. [DOI] [PubMed] [Google Scholar]
- 40. Georgiev I, Schmidt S, Li Y, Wycuff D, Ofek G, Doria-Rose N, et al. Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology. 2012;9. [Google Scholar]
- 41. Georgiev IS, Rudicell RS, Saunders KO, Shi W, Kirys T, McKee K, et al. Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J Immunol. 2014;192(3):1100–6. 10.4049/jimmunol.1302515 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Rudicell RS, Kwon YD, Ko SY, Pegu A, Louder MK, Georgiev IS, et al. Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J Virol. 2014;88(21):12669–82. 10.1128/JVI.02213-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.A Phase 1, Single Dose Study of the Safety and Virologic Effect of an HIV-1 Specific Broadly Neutralizing Human Monoclonal Antibody, VRC-HIVMAB080-00-AB (VRC01LS) or VRC-HIVMAB075-00-AB (VRC07-523LS), Administered Intravenously to HIV-Infected Adults. ClinicalTrials.gov Identifier: NCT02840474. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT02840474;.
- 44.Evaluating the Safety and Serum Concentrations of a Human Monoclonal Antibody, VRC-HIVMAB075-00-AB (VRC07-523LS), Administered in Multiple Doses and Routes to Healthy, HIV-uninfected Adults. ClinicalTrials.gov Identifier: NCT03387150. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT03387150;.
- 45.VRC 610: Phase I Safety and Pharmacokinetics Study to Evaluate a Human Monoclonal Antibody (MAB) VRC-HIVMAB095-00-AB (10E8VLS) Administered Alone or Concurrently With MAB VRC-HIVMAB075-00-AB (VRC07-523LS) Via Subcutaneous Injection in Healthy Adults. ClinicalTrials.gov Identifier: NCT03565315. NIAID And National Institutes of Health Clinical Center. September (2018). https://clinicaltrials.gov/ct2/show/NCT03565315;.
- 46. Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences. 2000;97(19):10383–10388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Nassar N, Horn G, Herrmann C, Block C, Janknecht R, Wittinghofer A. Ras/Rap effector specificity determined by charge reversal. Nature Structural and Molecular Biology. 1996;3(8):723. [DOI] [PubMed] [Google Scholar]
- 48. Fridman M, Maruta H, Gonez J, Walker F, Treutlein H, Zeng J, et al. Point mutants of c-raf-1 RBD with elevated binding to v-Ha-Ras. Journal of Biological Chemistry. 2000;275(39):30363–30371. [DOI] [PubMed] [Google Scholar]
- 49. Kiel C, Filchtinski D, Spoerner M, Schreiber G, Kalbitzer HR, Herrmann C. Improved binding of Raf to Ras·GDP is correlated with biological activity. Journal of Biological Chemistry. 2009;284(46):31893–31902. 10.1074/jbc.M109.031153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Sydor JR, Seidel RP, Goody RS, Engelhard M. Cell-free synthesis of the Ras-binding domain of c-Raf-1: binding studies to fluorescently labelled H-Ras. FEBS letters. 1999;452(3):375–378. [DOI] [PubMed] [Google Scholar]
- 51. Herrmann C, Horn G, Spaargaren M, Wittinghofer A. Differential interaction of the ras family GTP-binding proteins H-Ras, Rap1A, and R-Ras with the putative effector molecules Raf kinase and Ral-guanine nucleotide exchange factor. Journal of Biological Chemistry. 1996;271(12):6794–6800. [DOI] [PubMed] [Google Scholar]
- 52. Herrmann C, Martin GA, Wittinghofer A. Quantitative analysis of the complex between p21 and the ras-binding domain of the human raf-1 protein kinase. Journal of Biological Chemistry. 1995;270(7):2901–2905. [DOI] [PubMed] [Google Scholar]
- 53. Lakshman B, Messing S, Schmid EM, Clogston JD, Gillette WK, Esposito D, et al. Quantitative biophysical analysis defines key components modulating recruitment of the GTPase KRAS to the plasma membrane. Journal of Biological Chemistry. 2019;294(6):2193–2207. 10.1074/jbc.RA118.005669 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Block C, Janknecht R, Herrmann C, Nassar N, Wittinghofer A. Quantitative structure-activity analysis correlating Ras/Raf interaction in vitro to Raf activation in vivo. Nature structural biology. 1996;3(3):244 [DOI] [PubMed] [Google Scholar]
- 55. Campbell-Valois FX, Tarassov K, Michnick S. Massive sequence perturbation of the Raf Ras binding domain reveals relationships between sequence conservation, secondary structure propensity, hydrophobic core organization and stability. Journal of molecular biology. 2006;362(1):151–171. [DOI] [PubMed] [Google Scholar]
- 56. Fridman M, Walker F, Catimel B, Domagala T, Nice E, Burgess A. c-Raf-1 RBD associates with a subset of active vH-Ras. Biochemistry. 2000;39(50):15603–15611. [DOI] [PubMed] [Google Scholar]
- 57. Fetics SK, Guterres H, Kearney BM, Buhrman G, Ma B, Nussinov R, et al. Allosteric effects of the oncogenic RasQ61L mutant on Raf-RBD. Structure. 2015;23(3):505–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Gorman C, Skinner RH, Skelly JV, Neidle S, Lowe PN. Equilibrium and kinetic measurements reveal rapidly reversible binding of Ras to Raf. Journal of Biological Chemistry. 1996;271(12):6713–6719. [DOI] [PubMed] [Google Scholar]
- 59. Hunter JC, Manandhar A, Carrasco MA, Gurbani D, Gondi S, Westover KD. Biochemical and structural analysis of common cancer-associated KRAS mutations. Molecular cancer research. 2015;13(9):1325–1335. [DOI] [PubMed] [Google Scholar]
- 60. Filchtinski D, Sharabi O, Rüppel A, Vetter IR, Herrmann C, Shifman JM. What makes Ras an efficient molecular switch: a computational, biophysical, and structural study of Ras-GDP interactions with mutants of Raf. Journal of molecular biology. 2010;399(3):422–435. [DOI] [PubMed] [Google Scholar]
- 61. Lee J. New Monte Carlo algorithm: entropic sampling. Physical Review Letters. 1993;71(2):211 [DOI] [PubMed] [Google Scholar]
- 62. Nosé S. A molecular dynamics method for simulations in the canonical ensemble. Molecular physics. 1984;52(2):255–268. [Google Scholar]
- 63. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970. [Google Scholar]
- 64.Lou Q, Dechter R, Ihler AT. Dynamic Importance Sampling for Anytime Bounds of the Partition Function. In: Advances in Neural Information Processing Systems; 2017. p. 3196–3204.
- 65. Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins. 2015;83(10):1859–77. 10.1002/prot.24870 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Hallen MA, Donald BR. COMETS (Constrained Optimization of Multistate Energies by Tree Search): A provable and efficient protein design algorithm to optimize binding affinity and specificity with respect to sequence. Journal of Computational Biology. 2016;23(5):311–321. 10.1089/cmb.2015.0188 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Sommer R, Wagner S, Varrot A, Nycholat CM, Khaledi A, Häussler S, et al. The virulence factor LecB varies in clinical isolates: consequences for ligand binding and drug discovery. Chemical Science. 2016;7(8):4990–5001. 10.1039/c6sc00696e [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Jou JD, Holt GT, Lowegard AU, Donald BR. Minimization-Aware Recursive K* (MARK*): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. In: International Conference on Research in Computational Molecular Biology. Springer; 2019. p. 101–119. [DOI] [PMC free article] [PubMed]
- 69. Fridman M, Tikoo A, Varga M, Murphy A, Nur-E-Kamal M, Maruta H. The minimal fragments of c-Raf-1 and NF1 that can suppress v-Ha-Ras-induced malignant phenotype. Journal of Biological Chemistry. 1994;269(48):30105–30108. [PubMed] [Google Scholar]
- 70. Campbell-Valois FX, Tarassov K, Michnick S. Massive sequence perturbation of a small protein. Proceedings of the National Academy of Sciences. 2005;102(42):14988–14993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Roberts KE. http://www.cs.duke.edu/donaldlab/software/proteinInteractionViewer/. Protein Interaction Viewer. 2012.
- 72. Ylera F, Harth S, Waldherr D, Frisch C, Knappik A. Off-rate screening for selection of high-affinity anti-drug antibodies. Analytical biochemistry. 2013;441(2):208–213. [DOI] [PubMed] [Google Scholar]
- 73. Perspicace S, Banner D, Benz J, Müller F, Schlatter D, Huber W. Fragment-based screening using surface plasmon resonance technology. Journal of biomolecular screening. 2009;14(4):337–349. [DOI] [PubMed] [Google Scholar]
- 74. Lad L, Clancy S, Kovalenko M, Liu C, Hui T, Smith V, et al. High-throughput kinetic screening of hybridomas to identify high-affinity antibodies using bio-layer interferometry. Journal of biomolecular screening. 2015;20(4):498–507. [DOI] [PubMed] [Google Scholar]
- 75. Sun Q, Burke JP, Phan J, Burns MC, Olejniczak ET, Waterson AG, et al. Discovery of small molecules that bind to K-Ras and inhibit Sos-mediated activation. Angewandte Chemie International Edition. 2012;51(25):6140–6143. 10.1002/anie.201201358 [DOI] [PMC free article] [PubMed] [Google Scholar]