Summary
Understanding the relationship between protein sequence and structure well enough to design new proteins with desired functions is a longstanding goal in protein science. Here, we show that recurring tertiary structural motifs (TERMs) in the PDB provide rich information for protein-peptide interaction prediction and design. TERM statistics can be used to predict peptide binding energies for Bcl-2 family proteins as accurately as widely used structure-based tools. Furthermore, design using TERM energies (dTERMen) rapidly and reliably generates high-affinity peptide binders of anti-apoptotic proteins Bfl-1 and Mcl-1 with just 15 - 38% sequence identity to any known native Bcl-2 family protein ligand. High-resolution structures of four designed peptides bound to their targets provided opportunities to analyze strengths and limitations of the computational design method. Our results support dTERMen as a powerful approach that can complement existing tools for protein engineering.
Keywords: Bcl-2 proteins, protein-protein interactions, apoptosis, structure-based design, interaction specificity
Graphical Abstract
Structure-based prediction or design of protein-protein interactions typically evaluates atomistic models using a physical or statistical energy function. Frappier et al. demonstrate an alternative approach in which the sequence statistics of conserved tertiary structural motifs are used to predict and design peptide interactions with Bcl-2 proteins that control apoptosis.
Introduction
Protein-protein interactions are central to nearly all biological processes and contribute to pathology in human diseases. Reagents that can disrupt protein interactions are highly sought for basic research and therapeutic development, but protein interfaces are difficult to target. For example, large binding sites with multiple, widely spaced hotspots or flat interfaces that lack pockets are not easy to disrupt with small molecules. Antibodies and nanobodies can block protein interactions and have the advantage, relative to small molecules, of binding to larger interfaces. The difficulty of delivering large molecules into the cell, and the low stability of some antibody-derived agents in the reducing environment of the cytoplasm, restricts applications to extracellular targets or targets in chemically permeabilized cells (Chames et al., 2009).
Peptides provide a complementary approach to targeting protein interfaces. Peptide-protein interactions are ubiquitous in nature, where there are many examples of short segments binding to large, structurally complex surfaces (Frappier et al., 2018; Tompa et al., 2014). Peptides can be delivered into cells by chemically modifying them to increase hydrophobicity and hide hydrogen bonds/negative charges (Bird et al., 2016; Rezaei Araghi et al., 2018; Walensky and Bird, 2014), conjugating them to transduction domains (such as cell-penetrating peptides) (Nischan et al., 2015; Qian et al., 2016), or delivering them using nanoparticles (Hiraki et al., 2018; Kumar et al., 2014). Nevertheless, there are obstacles to developing peptide inhibitors. Peptides derived from naturally occurring sequences have non-optimal pharmacological properties, because they weren’t evolved for function as therapeutics. Furthermore, native ligands often have a binding affinity or specificity profile different from what is desired for a given application. Sequence optimization is typically required to minimize off-target binding, decrease protease sensitivity, reduce immunogenicity, and improve pharmacokinetics. Because we lack the ability to predict pharmacological potential a priori, an ability to rapidly design diverse sequences that tightly bind/inhibit a target would be transformative for the development of peptide therapeutics.
Current approaches for discovering peptide inhibitors have limitations. State of the art methods rely heavily on experimental screening, and screening for the “best” binders in a population does not typically provide diverse leads. Rational design, e.g. using computational models to search sequence-structure space on a larger scale, can guide screens to sequences unrelated to those in nature. However, given the essentially infinite space to explore, and the difficulty of predicting the best binders, the success rates of rational, structure-based methods have been low (Arkadash et al., 2017; Berger et al., 2016; Procko et al., 2014; Roberts et al., 2012).
Recent methodological developments have shown that mining sequence-structure relationships from the Protein Data Bank (PDB) has the potential to improve the efficiency and efficacy of structure-based modeling and design (Debartolo et al., 2012; DeBartolo et al., 2014; Feng and Barth, 2016; Fernandez-Fuentes et al., 2006; Mackenzie and Grigoryan, 2017). It has long been recognized that proteins are composed of recurring structural elements (Jacobs et al., 2016; Vanhee et al., 2011). The large number of solved structures now makes it possible to compile a finite, yet near-complete, list of the recurring tertiary structural motifs (here called TERMs) needed to construct any protein structure (Mackenzie et al., 2016). Recent analyses demonstrate that TERMs have characteristic sequence preferences that can be detected by statistical analysis of solved structures (Zheng and Grigoryan, 2017). These observations provide the foundation for a formalism that can quantify the compatibility of any sequence with any structural scaffold (Zhou et al., 2018).
TERM-based analyses have demonstrated utility for challenging modeling tasks. For example, a statistical analysis of TERM sequences is effective at discriminating between good and poor structure predictions, on par with or exceeding leading model quality assessment metrics (Zheng et al., 2015a). TERM sequence statistics also capture aspects of protein thermodynamics and can predict stability changes upon mutation as well as, or better than, state-of-the-art physics-based or statistical methods (Zheng and Grigoryan, 2017). Finally, TERM-based sequence-structure relationships can be applied to protein design. Choosing optimal sequences for native backbones, based solely on statistics of constituent TERMs, leads to native-like sequences and rationalizes observed evolutionary variation (Mackenzie et al., 2016). Zhou et al. recently described and extensively benchmarked a TERM-based design method, called dTERMen (design with TERM energies), demonstrating that it is predictive with respect to available data and can generate sequences that fold to the intended structure (Zhou et al., 2018). So far, TERM-based methods have not been applied to predicting or designing protein interactions.
In this work, we tested the ability of dTERMen to analyze and re-design peptide binders of anti-apoptotic proteins Bfl-1 and Mcl-1. Along with paralogs Bcl-2, Bcl-xL, and Bcl-w, these proteins promote cellular survival by binding and sequestering pro-apoptotic proteins. Mcl-1 and Bfl-1 have established roles in cancer cell survival and resistance to chemotherapy (Hiraki et al., 2018; Opferman, 2016). Although blocking Bcl-2 protein binding to pro-apoptotic partners is a validated therapeutic strategy (Cang et al., 2015; Souers et al., 2013), there are no clinically approved inhibitors of Bfl-1 or Mcl-1 at this time. Small molecules, peptides, and mini-proteins have been described as potential inhibitor leads (Berger et al., 2016; Dutta et al., 2013; Foight et al., 2014; Jenson et al., 2017; Kotschy et al., 2016; Rezaei Araghi et al., 2018).
Recently, the affinities of a large number of peptides for Bcl-xL, Mcl-1 and Bfl-1 were measured and reported (Jenson et al., 2018). In this work, we used the high-throughput interaction data to quantify the prediction performance of dTERMen and compared it with the performance of widely used methods Rosetta and FoldX. Based on the good results obtained, we turned to protein design and tested the ability of dTERMen to generate peptide binders of Bcl-2 family proteins Bfl-1 and Mcl-1. Our successes validate dTERMen as a promising approach that differs from existing design methods and that can be applied for rapid discovery of diverse, high-affinity peptide ligands.
Results
Bcl-2 family proteins Bcl-2, Bcl-xL, Bcl-w, Bfl-1 and Mcl-1 bind to Bcl-2 homology 3 (BH3) motifs within their interaction partners. The short ~23-residue BH3 motif folds into an alpha helix upon binding. Below, we refer to positions in BH3 peptides using a heptad notation, defined in Table S1. In this notation, positions 2d, 3a, 3d and 4a are typically hydrophobic, position 3a is conserved as leucine in native BH3 motifs, position 3e is conserved as a small amino acid, and position 3f is conserved as aspartate.
dTERMen can predict peptide binding affinities for Bcl-2 proteins
We tested dTERMen on a variety of BH3-binding prediction tasks. We used a dataset consisting of 4488, 4648 and 3948 measurements of BH3 peptides binding to Bcl-xL, Mcl-1 and Bfl-1, respectively (Jenson et al., 2018). The peptides were 23 residues in length and contained between 1 and 8 mutations made in the background of the BH3 sequences of human pro-apoptotic proteins BIM or PUMA. Affinity values were obtained using amped SORTCERY, a high-throughput method for quantifying dissociation constants for peptides displayed on the surface of Saccharomyces cerevisiae (Jenson et al., 2018; Reich et al., 2015). Using this assay, thousands of peptides were determined to have apparent cell-surface dissociation constants ranging from 0.1 to 320 nM, with some peptides classified simply as binding tighter or weaker than the extremes of this range.
Using the amped SORTCERY data, we defined three tasks of increasing difficulty. The easiest task was to discriminate the tightest 20% of binders from the weakest 20%, for a particular target protein. We also defined an enrichment task, which involved identifying the tightest 10% of binders and, finally, the difficult task of predicting quantitative affinities within a 5 kcal/mol range in apparent binding energies. For these tests, we compared the performance of dTERMen with that of commonly used methods Rosetta (Alford et al., 2017; Lewis and Kuhlman, 2011) and FoldX (Schymkowitz et al., 2005).
As input to the prediction calculations, we used experimental structures of Bcl-2 protein-peptide complexes. Querying the PDB and filtering for peptides at least 20 amino acids long yielded 15, 6 and 25 protein-peptide complexes for Bcl-xL, Bfl-1 and Mcl-1, respectively (Table S2). An analysis of the BH3 peptides in these complexes revealed that they all adopt a similar binding mode, with average pairwise Cα RMSD of 1.64 ± 0.85 Å for the binding interface, defined as Cα atoms of the peptide and surrounding protein residues (Table S3).
We tested whether different modeling approaches could discriminate high affinity binders from peptides that were not observed to bind or that bound weakly. Table 1 reports the average performance of each method over all structural templates. Binary classification of tight vs. weak binders is reported as the area under the receiver operating characteristic curve (AUC). Performance averaged for all protein targets shows that dTERMen (AUCavg = 0.78) has similar predictive power to the other scoring methods, Rosetta (AUCavg = 0.75) and FoldX (AUCavg = 0.75). The small difference is driven by performance on the Bcl-xL dataset, for which dTERMen (AUCavg = 0.75) is better than Rosetta (AUCavg = 0.69) and FoldX (AUCavg = 0.68). For the task of predicting quantitative binding energies, performance averaged for all protein targets shows that dTERMen (Ravg = 0.37), Rosetta (Ravg = 0.34) and FoldX (Ravg = 0.31) gave similar performance, with dTERMen outperforming the other methods for Bcl-xL.
Table 1. Interaction prediction performance averaged over all templates.
AUCb | Affinity Correlationc | Enrichment of the top 10% of binders (%)d |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Target | Bcl-xL | Bfl-1 | Mcl-1 | Meane | Bcl-xL | Bfl-1 | Mcl-1 | Mean e | Bcl-xL | Bfl-1 | Mcl-1 | Meane |
FoldX | 0.68 ± 0.11 (3)f |
0.75 ± 0.06 (2) |
0.82 ± 0.08 (6) |
0.75 ± 0.06 (14) |
0.23 ± 0.11 (1) |
0.34 ± 0.07 (2) |
0.37 ± 0.11 (3) |
0.31 ± 0.06 (9) |
20.3 ± 6.6 (1) |
19.9 ± 4.0 (0) |
23.4 ± 8.2 (0) |
21.4 ± 1.8 (1) |
Rosetta | 0.69± 0.05 (1) |
0.72 ± 0.02 (2) |
0.85 ± 0.03 (12) |
0.75 ± 0.07 (15) |
0.24 ± 0.06 (1) |
0.32 ± 0.03 (2) |
0.45 ± 0.04 (16) |
0.34 ± 0.09 (19) |
20.5 ± 3.8 (1) |
18.96 ± 3.6 (0) |
24.44 ± 4.4 (13) |
24.3 ± 6.7 (14) |
dTERMen | 0.75 ± 0.06 (11) |
0.73 ± 0.03 (2) |
0.83 ± 0.06 (7) |
0.78 ± 0.04 (20) |
0.35 ± 0.07 (13) |
0.32 ± 0.04 (2) |
0.44 ± 0.10 (6) |
0.37 ± 0.05 (21) |
29.9 ± 4.9 (13) |
27.5 ± 1.4 (6) |
35.2 ± 35.1 (12) |
30.9 ± 3.2 (34) |
Average and standard deviation of the reported performance metric over all templates.
Area under the ROC curve for discriminating the top 20% of binders from the bottom 20%.
Pearson correlation between predicted and experimental binding energy.
Percentage of top 10% of binders found in the predicted top 10% of binders.
Average and standard deviation of performance over all three proteins.
The number in parentheses is the number of templates for which the indicated method gave the best performance, out of the three methods tested. The total number of templates was 15 for Bcl-xL, 6 for Bfl-1 and 25 for Mcl-1.
Many applications seek the tightest binding partners for a protein target. We used each method to rank the 4386, 4491 or 3805 sequences that had measured affinities for Bcl-xL, Mcl-1 or Bfl-1. We then examined the top 10% of computationally ranked sequences to determine what proportion of the top 10% of experimental binders were captured. Overall, dTERMen had better enrichment performance than Rosetta and FoldX (31% vs. 24% and 21%). We tested predictions using 45 different input structures, and dTERMen had the best enrichment performance for 34 of these cases. dTERMen had the best enrichment score on all of the Bfl-1 templates and 13 of the 15 Bcl-xL templates.
The results in Table S2 show that predictive power varies significantly as a function of the template used for modeling. For example, FoldX predictions for the Bcl-xL dataset resulted in AUC values from 0.39 to 0.82, depending on template choice. In one case, multiple protein complexes in the asymmetric unit of crystal structure 5C6H, with an average pairwise binding interface RMSD of 0.69 Å, gave AUC values from 0.65 to 0.82. It is not surprising that binding affinity predictions depend on the input template structures, particularly for dTERMen and FoldX, which do not perform template structure relaxation. But there is no reliable way to know, a priori, which template will give the best agreement with experiments. We found no relationship between structure resolution and performance for any of the methods (negative correlation expected, average observed Pearson R = 0.18 +/− 0.34; Table S2). A computationally tractable approach is to use the template that results in the lowest predicted energy for each sequence. Table 2 shows that performance improved for all methods when the lowest energy for each sequence, over all templates, was used. FoldX performance improved the most; FoldX mean AUC increased from 0.75 to 0.85, the mean Pearson correlation for binding affinity values improved from 0.31 to 0.47, and mean enrichment of top binders increased from 21% to 29%.
Table 2.
AUC (%) | Affinity Correlation | Enrichment of the top 10% of binders (%) |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Target | Bcl-xL | Bfl-1 | Mcl-1 | Mean | Bcl-xL | Bfl-1 | Mcl-1 | Mean | Bcl-xL | Bfl-1 | Mcl-1 | Mean |
FoldX | 0.83 | 0.83 | 0.93 | 0.85 ± 0.05 | 0.39 | 0.46 | 0.56 | 0.47 ± 0.07 | 27.4 | 27.3 | 32.7 | 29.1± 2.5 |
Rosetta | 0.70 | 0.76 | 0.92 | 0.78 ± 0.09 | 0.25 | 0.33 | 0.53 | 0.37 ± 0.12 | 20.1 | 20.0 | 38.1 | 26.0± 8.5 |
dTERMen | 0.77 | 0.74 | 0.89 | 0.80 ± 0.06 | 0.38 | 0.33 | 0.50 | 0.41 ± 0.07 | 35.3 | 29.2 | 43.3 | 35.6± 5.8 |
Notes for Table 1 apply
We were struck by the strong dependence of predicted binding affinities on the choice of template structure and thought this might be an area where dTERMen could provide an advantage. The robustness of prediction performance to small differences in input structures was evaluated using 294 pairs of structures of the same protein complex with binding interface Cα atom RMSD < 1 Å. For each pair, we computed the correlation of predicted binding energies for all peptides with measured dissociation constants. The results are shown in Fig. 1. On average, dTERMen (Ravg = 0.77) is much less sensitive to small differences in input template than FoldX (Ravg = 0.55). When run with default options, the Rosetta “relax” protocol is slightly more robust than FoldX (Rosetta Ravg = 0.60), and further structural sampling could, at least in theory, lead to a convergence of the Rosetta predictions made using different templates, albeit at a higher cost in computing time.
Peptides designed using dTERMen bind to Mcl-1 and Bfl-1 with high affinity and variable specificity
Because dTERMen performed well in prediction tests, we reasoned that it might be useful for designing peptide binders. Given a template structure, dTERMen can solve for the optimal sequence to fit on the template. In this work, we designed a sequence to fit on the peptide chain in a template complex, in the context of the fixed sequences of the protein target. We chose 5 structures as design templates: two Bfl-1 complexes and three of Mcl-1 complexes (Table S4). Templates were chosen to sample structural diversity, because designing on different templates provides access to different sequences (Fig. S2).
For Bfl-1-targeted designs, we selected structures of Bfl-1 bound to the natural ligand PUMA (PDB ID 5UUL) and to a Bfl-1 selective peptide (FS2) identified previously (PDB ID 5UUK) (Jenson et al., 2017). Because PUMA and FS2 peptides are shifted 1.2 Å and rotated 17° relative to one another in the Bfl-1 binding pocket, we expected differences in the sequences identified by dTERMen for these two templates. To target Mcl-1, we used structures of Mcl-1 bound to BIM (PDB ID 2PQK) and to a chemically crosslinked variant of BID, BID-MM (PDB ID 5C3F) (Fire et al., 2010; Miles et al., 2016). The binding modes of BIM and BID-MM are similar (peptide RMSD = 0.76 Å when superimposing the binding interface), but the Mcl-1 protein has differences in the binding pocket in the two structures (binding site RMSD = 1.13 Å). We also used a structure of Mcl-1 bound to peptide FS2, which has low affinity for Mcl-1 (Kd > 3 μM) but engages the protein in a unique binding pose (PDB ID 5UUM) (Jenson et al., 2017).
Peptide sequences were designed on each of the templates using dTERMen. Preliminary calculations showed that the designed sequences included medium sized hydrophobic residues at 3a and negatively charged residues at 3f, similar to the conserved leucine and aspartate residues at these positions in native BH3 motifs (Table S1). However, dTERMen-designed sequences did not preserve native trends at position 4b. This position in native BH3 peptides is often asparagine, aspartate or histidine, which can serve as an N-terminal helix cap for helix 5 of Mcl-1 or Bfl-1. We noticed that dTERMen chose a variety of amino acids at this position (Lys, Glu, Ser, Ala, Val, Tyr, and Thr). To explore the reason for this, we extracted the N-terminal helix-capping motif from each template (see Fig. S3) and recovered closely-matching backbone geometries from the PDB. To our surprise, whereas matches made to Bcl-2 family proteins exhibited a strong preference for asparagine or aspartate at the capping position, the frequency of capping residues in other matches was considerably lower (e.g., on average, 6% and 10% for asparagine or aspartate, respectively, in the top ~600 non-Bcl-2 homologous matches). While it was unclear whether or not a capping residue at position 4b is required, we chose to fix this position to its native identity in each design template. BH3 residue 3b can also make a helix-capping interaction, and in this case, we fixed the wild-type amino-acid identity in half of the designs (dF1-dF4, dM1-dM4) but used no constraint for the other half (dF5-dF8, dM5-dM7). Two sequences were designed on 5UUM, without any sequence constraints.
Table 3 shows the optimal designed peptide sequence (the provably best-scoring sequence, given the constraints) for each template structure. For many of the designs, re-packing the protein and peptide side chains on the rigid-backbone design template showed predicted steric clashes. We used PyMol to visualize regions of possible over-packing (Fig. S4). Because some backbone relaxation is expected, and because the dTERMen scoring function predicted that the designed sequences are compatible with structures closely related to the design templates, we did not filter the designs using any kind of clash criterion.
Table 3. BH3 sequences for template structures (bold) and for peptides designed on those templates using dTERMen.
Name | Sequencea ---∣--2---∣--3---∣--4--- efgabcdefgabcdefgabcdefg |
PDB ID |
---|---|---|
FS2 | -QWVREIAAGLRRAADDVNAQVE- | 5UUK |
dF1 | -SYVDKIADVMREVAEKINSDLT- | |
dF2 | -SYIDKIADLIRKVAEEINSKLE- | |
dF5 | -SYVDKIADLMKKVAEKINSDLT- | |
dF6 | -SYIDKIADLIDKVVEEINSKLE- | |
PUMA | -QWAREIGAQLRRMADDLNAQYER | 5UUL |
dF3 | -SLLEKLAEELRQLADELNKKFEK | |
dF4 | -SLLEKLAEYLRQMADEINKKYVK | |
dF7 | -SLLEKLAEELAQLADELNKKFEK | |
dF8 | -SLLEKLAEYLAQMGDEINKKYVK | |
BIM | GRPEIWIAQELRRIGDEFNAYYA- | 2PQK |
dM1 | APKEKEVAETLRKIGEEINEALK- | |
dM2 | APYLEQVARTLRKIGEEINEALR- | |
dM5 | APKEKEVARTLIKIGEEINEALK- | |
dM6 | APYLEQVARTLLHIGMEINEALR- | |
BID-MM | EDIIRNIARHLAXVGDXBDRSI-- | 5C3F |
dM3 | DKTLEEIARELAKLAEEIDKEI-- | |
dM4 | DKTLEEIARWLARLALEIDKEI-- | |
dM7 | DKTLEEIARELLKLALEIDKEI-- | |
FS2 | -QWVREIAAGLRRAADDVNAQVER | 5UUM |
dM9 | -DIEQEIAEALKEVADELSKAIED | |
dM10 | -DVVLSVAETLRELADRLYEEINT |
X= position of hydrocarbon staple; B=Norleucine
To illustrate the diverse sequences that can be designed by dTERMen, Fig. 2A shows logos built from 1000 sequences designed on each template, without any constraint on position 3b. These data, and designed sequences in Table 3 and in Fig. S2, confirm that peptides designed on different templates were highly variable. Particularly notable was the diversity observed at positions 3a and 3f. Although dTERMen overwhelmingly chose leucine at position 3a for template 5C3F, consistent with conservation of leucine at this position in native sequences, greater variation was observed at 3a when using other templates. Designs based on structure 5UUK included isoleucine or methionine more often than leucine. Position 3f is conserved as aspartate in the natural sequences, but dTERMen chose a variety of polar residues at this site for all templates.
To evaluate dTERMen designs, 17 out of the 18 peptides in Table 3 were selected for experimental testing. Sequence dM8 was not tested because it was only one mutation away from design dM7. The sequences chosen for testing, like all sequences resulting from the design protocol, were very different from any previously known BH3 sequences. Figs. 2B and C summarize the minimum number of mutations between the peptides we tested and any of the 13 native BH3 sequences in Table S1. Designed peptide binding to Bfl-1, Mcl-1, Bcl-xL, Bcl-w, and Bcl-2 was assayed by yeast-surface display. Binding data from yeast-surface display assays have been shown to correlate well with solution affinity measurements, and many BH3 peptides that are tight binders on the yeast cell surface have also been validated as high-affinity binders in solution (Dutta et al., 2010; Foight et al., 2014; Gai and Wittrup, 2007). 7 out of 8 peptides designed to bind Bfl-1 showed concentration-dependent binding that saturated at or below 10 nM Bfl-1. 8 of 9 sequences designed to bind Mcl-1 also showed concentration-dependent binding, with apparent cell-surface dissociation constants estimated as < 100 nM (Figs. 2D, E). The results show that constraints on the helix-capping residues at positions 3b and 4b were not necessary for the designed peptides to bind their targets tightly. Peptides designed based on the 5UUM template, a structure of Mcl-1 bound to low-affinity ligand FS2, bound approximately 100-fold more tightly than did FS2 itself, supporting dTERMen as a way to improve the affinity of initial leads for which structures are available (Fig. 2, Table S6).
Peptides dF6 and dM6 did not bind to their targets with high affinity. Peptide dF6 has a valine at position 3e, which is conserved as small (Ala or Gly) in native BH3 peptides, in previously reported designed peptides, and in all of the other dTERMen-designed peptides that we tested (Dutta et al., 2010, 2013; Jenson et al., 2017). Structural matches identified by dTERMen suggested that valine could be accommodated in the context of helix-helix interface geometries highly similar to the site in 5UUK (Fig. S5). Nevertheless, an all-atom model built using template 5UUK highlights clashes due to the close proximity of the Cα atom of dF6 position 3e and the backbone of position 88 in helix 5 of Bfl-1 (Fig. S4A), and valine may be too large to be accommodated at this site. For design dM6, we hypothesize that substitution of arginine and aspartate at positions 3b and 3f of BIM with leucine and methionine, respectively, and concomitant disruption of a charged network between the peptide and the protein, was destabilizing. These features are consistent with dM6 not binding to any of the Bcl-2 family members we tested (see below). At this time, we have not investigated the aspects of the model responsible for these residue choices.
There is substantial interest in developing Bcl-2 family paralog-selective inhibitors (Foight et al., 2014; Jenson et al., 2017; Opferman, 2016; Rezaei Araghi et al., 2018). To determine whether our designs cross-react with other anti-apoptotic family members, we tested binding of each peptide to Bfl-1, Mcl-1, Bcl-xL, Bcl-w, and Bcl-2. Interestingly, the Bfl-1 binders that were designed on the structure of PUMA in complex with Bfl-1 (5UUL) bound to multiple Bcl-2 family members. In contrast, peptides designed on 5UUK, which is the structure of Bfl-1-specific peptide FS2 bound to Bfl-1, were > 100-fold selective for Bfl-1, like FS2 itself. The data were less clear for Mcl-1 binders, some of which were selective (dM1, dM5) whereas others were not (dM2, dM3, dM4, dM7, dM9, dM10) (Fig. 3, Table S6).
High-resolution structures show how designed peptides engage Bfl-1 and Mcl-1
We solved crystal structures of dF1 and dF4 in complex with Bfl-1 and of dM1 and dM7 in complex with Mcl-1 (Fig. 4 and Table 4). The structure of dF1 in complex with Bfl-1, resolved to 1.58 Å, shows that this peptide binds very similarly to FS2 in template 5UUK (Fig. 4A). It is striking how similar the pocket-facing positions of the designed peptide dF1 and template peptide FS2 are, even though the sequence identity of these two peptides is low (27%) and no information about the FS2 sequence was used in the design process (Fig. S6A). Modeling dF1 onto the FS2 backbone in structure 5UUK indicated minor clashes, including between methionine at position 3a and residues in the P2 pocket of Bfl-1 (made by Met 75, Phe 95, and Glu 78), isoleucine at position 4a with Val 44 in helix 2 of Bfl-1, and valine at position 3d with Val 48 and Val 44 of helix 2 of Bfl-1 (Fig. S4D). A more substantial clash was anticipated between valine at position 2g and Leu 52 of helix 2 of Bfl-1 (Fig. S4F). The crystal structure of dF1 bound to Bfl-1 shows how small adjustments accommodate these residues. For example, in the region around valine at 2g, backbone adjustments are seen for Bfl-1 residues 50-63 that make room for this residue and lead to a modest divergence of the N-terminus of FS2 in 5UUK compared to dF1 in our new structure (Fig. 4A).
Table 4.
Bfl-1:dF1 PDB ID 6MBB |
Bfl-1:dF4 PDB ID 6MBC |
Mcl-1:dM1 PDB ID 6MBD |
Mcl-1:dM7 PDB ID 6MBE |
|
---|---|---|---|---|
Data Collection | ||||
Space Group | P 1 21 1 | P 1 21 1 | P 21 21 21 | P 32 2 1 |
Cell parameters | ||||
a, b, c | 43.2 42.9 47.7 | 43.5 42.9 46.7 | 64.8 69.7 84.9 | 80.8 80.8 58.0 |
α, β, γ | 90 115.96 90 | 90 114.2 90 | 90 90 90 | 90 90 120 |
Rmeas | 0.078 (0.399) | 0.078 (0.43) | 0.137 (0.981) | 0.122 (.070) |
Rpim | 0.029 (0.169) | 0.036 (0.259) | 0.047 (0.398) | 0.045 (0.338) |
Mean I/σ(I) | 30.9 (2.9) | 22.4 (2.0) | 15.6 (0.66) | 18.3 (1.5) |
Completeness (%) | 91.5 (77.8) | 94.6 (74.4) | 95.8 (67.3) | 97.3 (79.0) |
Redundancy | 6.5 (4.3) | 4.1 (1.9) | 7.7 (3.9) | 6.7 (3.4) |
Refinement | ||||
Resolution (Å) | 38.86 - 1.59 (1.644 - 1.587) | 42.56 - 1.75 (1.815 - 1.752) | 24.3 - 1.95 (1.994 - 1.945) | 29.94 - 2.25 (2.327 - 2.247) |
Unique Reflections | 19602 (1654) | 13573 (1193) | 25709 (1922) | 9363 (833) |
Rwork/Rfree | 0.17/0.20 (0.24/0.26) | 0.18/0.21 (0.26/0.33) | 0.20/0.24 (0.33/0.37) | 0.12/0.22 (0.25/0.26) |
Number of non-hydrogen atoms | 1623 | 1541 | 3047 | 1482 |
Average B-factors RMSD | 30.2 | 42.2 | 35.8 | 49.1 |
Bond lengths (Å) | 0.006 | 0.003 | 0.006 | 0.002 |
Bond angles (°) | 0.77 | 0.54 | 0.74 | 0.41 |
Values in parentheses are for the highest-resolution shell.
We solved the structure of dF4 bound to Bfl-1 to 1.75 Å. The C-terminal end of the peptide adopts a different conformation than does PUMA BH3 bound to Bfl-1 in structure 5UUL (Fig. 3B). In template 5UUL, the helix begins to unwind around position 4d, but in the redesigned structure the helix unwinds 3 residues earlier. dTERMen identified relatively few matches for structural elements at the C-terminus of 5UUL, which may have contributed to the deviation from the design template (Fig. S7). At the N-terminus, the sequence of dF4 is very different from that of PUMA; there is only one identical residue within the first 10 residues. An important change was glycine (in PUMA) to alanine (in dF4) at position 2e. In 5UUL, this site is located at a tightly packed helix-helix crossing. Although only glycine can fit when modeled on the rigid design template, TERM statistics indicated that alanine is common in very similar geometries. The solved structure shows how the dF4 helix shifts slightly to accommodate alanine, along with other sequence changes.
We solved the structure of dM1 bound to Mcl-1 to 1.95 Å and found that that it binds very similarly to the BIM BH3 peptide in design template 2PQK (Fig. 3C). However, the structure of dM7 bound to Mcl-1 at 2.25 Å resolution revealed a substantial change in the binding mode of the peptide (Fig. 4D, Fig. S6B-D). The helix is shifted in the groove by 3.43 Å and rotated by 19 degrees along the helix axis, relative to the position of BID-MM in the design template structure 5C3F. A shift of the helix in the groove by approximately one-half helical turn re-positions leucine at 3a relative to what is observed in structures of native BH3 peptides bound to Bcl-2 family proteins. Furthermore, the canonical BH3 interaction of aspartate at 3f with Bfl-1 Arg 263 is replaced by a salt bridge with an aspartate at position 4b in the peptide (Fig. S6C). In Mcl-1, alpha helix 4 is rearranged relative to its position in the template, to accommodate the unusual sequence. The reorganization may have resulted from introducing two leucine residues at peptide positions 3b and 3f. Not only does leucine at 3f remove the aspartate residue at this position in BIM, BID and PUMA, but leucine at 3b is predicted to interfere with an intra-molecular salt bridge between Bfl-1 residues 256 and 263. The shift of peptide dF1 observed in the crystal structure restores the salt-bridge network between Bfl-1 and the peptide, using a different peptide residue, as shown in Fig. S6C. One complication in evaluating this structure is that there are close crystal-packing contacts between two copies of the Mcl-1:dM7 complex, near the C-terminal end of the binding groove and involving alpha helix 4 of Mcl-1 (Fig. S6B,E,F). We cannot rule out the possibility that crystal packing favored population of a minor structural species.
In summary, x-ray crystallography revealed that backbone positioning of two of the crystalized designs (dF1 and dM1) were sub-Ångstrom matches to their design templates, over most of the length of the peptide. Another peptide (dF4) bound in a geometry that shared high similarity with its template, but the remaining design (dM7) bound in an unexpected, dramatically shifted orientation.
Discussion
Using dTERMen, we were able to rapidly design high-affinity binders of Bcl-2 family proteins without the need for explicit modeling of complex structures or experimental library screening. Previous work has shown that this is not a trivial task. For example, in a library of random peptides, nearly all fail to bind Mcl-1 detectably (Lee et al., 2009). Additionally, even in carefully designed libraries containing peptides with fewer than 6-8 mutations compared to natural BH3 domains, most sequences fail to bind Bfl-1 and Mcl-1 (Jenson et al., 2017). In contrast, using dTERMen, we found that 15 of 17 designs bound with native-like affinity, even though the sequences were 14-22 mutations away from known BH3 binders (Fig. 2 B-C).
The dTERMen protocol provided access to diverse sequences that are dissimilar to native BH3 motifs. Some of the tight binders we discovered lack the highly conserved leucine and aspartate residues common to all known, native BH3 sequences (Table 3, Fig. S2), demonstrating that that dTERMen is a useful tool for discovering binders that cannot be predicted based on conserved sequence features. Designing on different structural templates gave rise to different solutions. This may seem to be at odds with our finding that dTERMen is robust to small differences in input structure (Fig. 1), but we deliberately chose design templates to sample different peptide docking geometries. We expected these templates to match with different TERMs from the PDB, and thus to generate different sequence predictions. Templates 5UUL and 2PQK are structures of complexes with native, high-affinity BH3 peptide binders (reported dissociation constants of ~1 nM) (Fire et al., 2010; Jenson et al., 2017). Other templates, e.g. 5C3F and 5UUM, featured peptides that bound their targets more than 3 orders of magnitude more weakly (Jenson et al., 2017; Miles et al., 2016). It is notable that template structures for both high-affinity and low-affinity peptide complexes led to high-affinity peptide binders when used as input to dTERMen. Designing on other solved structures could provide access to even greater diversity (Fig. S2A). Going beyond solved structures, it may be possible to perform dTERMen design on predicted structures with binding modes that have not been previously observed.
A set of designs with diverse sequences is more valuable that a single design optimized for affinity, because it provides opportunities to optimize pharmacological properties not related to binding. Our designed peptides have formal net charges ranging from −7 to +1, predicted helical content ranging from 0.7 to 69.7% and predicted hydrophobicity of 0.03 to 0.48 (Table S5). These properties could affect whether these peptides are disruptive to membranes and how readily they can be delivered to cells. Several studies have shown that the cell permeability of stapled helical peptides depends on peptide properties including charge and hydrophobicity (Bird et al., 2016; Rezaei Araghi et al., 2018). Different sequences will also have different cross-reactivity, immunogenicity, and protease sensitivity, so having many options to choose from increases the chances of developing useful reagents and lead therapeutics. Interestingly, design using dTERMen is compatible with imposing constraints on peptide properties such as net charge, so if the desired physical characteristics of a peptide inhibitor are known, they can be used to direct the search into promising sequence spaces.
The dTERMen scoring potential is based on sequence statistics for structural elements observed repeatedly in nature. There is no formal relationship between these statistics and protein stability or affinity, so the scoring may reflect any number of evolutionary pressures including stability, specificity, folding kinetics, solubility, or other factors. We interpret the success of dTERMen as evidence that whatever evolutionary forces contribute to the statistics, there is a strong signal related the free energy of the sequence adopting the evaluated structure. The fact that we designed helix-helix interactions in this project, which are common in the PDB, may be part of the reason dTERMen performed so well. Notably, however, the structural database used by dTERMen in this work included single-chain structures only, so no information from other helix-into-grove interfaces (or any other types of interfaces) was used in data mining. Nevertheless, we do not yet know what success rates we will achieve on other problems, but because more structures are deposited in the PDB every day, we expect the range of accessible design targets to increase over time (Zheng and Grigoryan, 2017).
One attractive feature of dTERMen is that it doesn’t require explicit structural modeling or minimization; the design optimization is performed in sequence space. Although the PDB structure-mining that is required to build the scoring function is time consuming (e.g. it takes 7 to 12 CPU hours per structure to generate scoring functions used here), once such a function is derived, it is possible to perform design, or to evaluate millions of sequences, in seconds. Another advantage of dTERMen is that there is a structural “fuzziness” built in, because the sequence statistics used for modeling are derived from close, but not exact, matches of TERMs. This makes the method more robust than FoldX to small variations in input structure, as shown in our benchmark testing, and also accounts for some structural relaxation. In this work, we observed multiple examples where a mutation was accommodated that would not have been designed if modeling was performed on a rigid scaffold (Fig. S4). On the other hand, dTERMen design failures may result from over-packing the protein-peptide interface beyond what can be accommodated by small structural rearrangements. This may be what happened for dF4, the structure of which diverged from the design template structure at the C-terminal end of the peptide (Fig. 4), and for dF6, which did not bind tightly to Bfl-1. Combining dTERMen with a post-analysis procedure that includes all-atom modeling using peptide redocking (Zheng et al., 2015b) or MD simulation (Davey and Chica, 2014) could be one way to recognize sequences that cannot be accommodated. Although this would increase the computational costs, such a secondary evaluation could be performed for a modest number of promising candidates designs.
One unexpected result from this work is that the specificity profiles of the designs were template dependent. Although multiple templates provided designs with paralog-specific binding preferences, this was particularly striking for the FS2 template. Although no off targets were considered during design, the peptides designed using the FS2 structure were highly Bfl-1 selective. In fact, these peptides provide outstanding leads for development as Bfl-1 targeting agents. The specificity of peptides dF1, dF2 and dF5 may arise because FS2 adopts a non-canonical binding mode to engage Bfl-1 that has not been observed for natural BH3 ligands (Jenson et al., 2017). We found that FS2 is shifted in the pocket towards a region of Bfl-1 that contains many residues are unique to Bfl-1, compared to the other Bcl-2 family paralogs. It may be that these residues are under less evolutionary pressure to maintain binding to native BH3 ligands, and so can be used to achieve specificity (Fig. S8). This is consistent with the idea that a peptide that makes contacts outside of the conserved binding cleft can use these contacts to achieve intra-family specificity (Berger et al., 2016; Procko et al., 2014).
This proof-of-principle study makes us enthusiastic about the potential of dTERMen for designing peptide binders. The ease of use, fast run times, and high success rates on a difficult problem provide compelling evidence of the promise of this approach. Future applications could exploit dTERMen scoring speed by screening proteomes to predict candidate binding partners, or could leverage the robustness of dTERMen to scaffold variation by designing on low resolution structures. There are ample opportunities to improve dTERMen further, for example by combining this sequence-based design approach with all-atom modeling to better assess whether mutations can be accommodated by structural relaxation. We look forward to tackling increasingly difficult problems and moving the use of TERM statistics into the mainstream of modern protein design.
STAR Methods
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Amy E. Keating (keating@mit.edu).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
EBY100 yeast cells were used for binding affinity estimation. Cells were passaged overnight at 30 °C and diluted to OD600 of 0.005-0.01 in SD + CAA (5 g/L casamino acids, 1.7 g/L yeast nitrogen base, 5 g/L ammonium sulfate, 10.2 g/L Na2HPO4-7H2O and 8.6 g/L NaH2PO4-H2O, 2% glucose) and grown to an OD600 of 0.1–0.6. Cell cultures were then diluted 25-fold with SG + CAA (5 g/L casamino acids, 1.7 g/L yeast nitrogen base, 5.0 g/L ammonium sulfate, 10.2 g/L Na2HPO4-7H2O and 8.6 g/L NaH2PO4-H2O, 2% galactose) to induce peptide expression and grown for 20-24 hr at 30 °C.
METHOD DETAILS
dTERMen scoring function
A full description of the dTERMen procedure, along with extensive validation and benchmarking, is given in Zhou et al. (Zhou et al., 2018). For completeness, we briefly outline the method here, at a high level. Given a target protein structure, D, for which an appropriate amino-acid sequence is needed, dTERMen begins by defining effective self-energies for each amino acid at each position of D and effective pair interaction energies between amino acids at pairs of positions. We collectively refer to these as energy parameters (EPs) and their values in our procedure are deduced from statistics of structural matches to appropriately defined TERMs that make up D. The matches are obtained by searching a structural database. In this work, the database was a subset of the PDB containing 14,546 chains from X-ray structures with resolution better than 2.6 Å, pruned for redundancy at 30% sequence identity. Importantly, this means there was no quaternary structural information present in the database, and all insights on how to design domain-peptide interfaces were derived from intra-chain examples.
The fundamental idea behind our procedure is to define TERMs from D in a way that is targeted at isolating individual EP contributions. For example, to capture the pairwise dependence between amino-acid identities at positions i and j (i.e., the pair EP), we define a TERM that consists of residues i, j, and their surrounding backbone fragments (e.g., ± 2 residues around each residue). By obtaining a sufficiently large list of closest matches to the generated motif (pruned for redundancy), one can analyze the co-dependence between identities at i and j. One complicating factor is that identities at the two positions are also biased by the specific environments from which the matches originate. And, in some cases, this bias could affect the apparent co-dependence. E.g., if the two positions are usually either both buried or both exposed within matches, it may appear that there is a direct favorable interaction between amino acids of similar hydrophobicity at i and j. Such effects are corrected for in dTERMen by computing EPs as log-odds ratios between observed and expected numbers of observations (e.g., observations of amino-acid pairs in this case), where the expectation is calculated by accounting for the effect of the environment in the structures from which matches originate. Self-EP contributions arising from interactions between a residue and nearby backbone fragments are computed similarly. These include interactions with both the local sequence-contiguous backbone (the own-backbone energy) and backbone fragments proximal in 3D (the near-backbone energy). These contributions augment pre-tabulated amino-acid self-energies associated with different backbone φ/Ψ and ω dihedral angles and burial states to form the final EP contributions.
In choosing the number of closest matches to retain for each TERM, dTERMen aims to balance two opposing effects: large numbers of matches are desirable for giving a strong statistical signal, while close matches are preferred because these more accurately report on the sequence determinants of the original query motif. As a compromise between these two, dTERMen imposes both a maximal RMSD cutoff as well a minimal number of required matches (generally ~1,000 matches, though this varies slightly by TERM type, see Zhou et al. (Zhou et al., 2018)), with the RMSD cutoff applied only once the number of matches exceeded the required minimum.
The above computed contributions are compiled into an energy table of one- and two-body contributions, after which Integer Linear Programing (ILP) is used to identify the sequence with the most optimal score (Grigoryan et al., 2009; Kingsford et al., 2005). Note that all energies are defined on the sequence level, such that optimization can proceed directly in sequence space, without the need to build explicit atomic structures. And yet, because each EP contribution arises from an ensemble of TERM matches, a certain amount of implicit backbone flexibility is built into the scoring function. Fig. S1 illustrates examples of TERMs and their matches used in dTERMen designs performed in this work.
dTERMen design protocol
When the design problem pre-specifies some of the residues in the target structure D, as is the case in the present application, the calculations remain the same but some re-shuffling between pair and self EPs takes place. For example, when position i in an interacting pair i-j is fixed, the TERM-derived effective pair EP between the two is added to the self-energy of position j in the final table. Because in the present case the sequence of the entire domain was always fixed, the only pairwise contributions in the final table were those between pairs of peptide positions.
The two versions of dTERMen used here differ in how TERMs for computing the near-backbone energy for residue i are defined (see Zhou et al. for full details (Zhou et al., 2018)). The ideal TERM for this purpose would include the residue i, its local backbone fragment, all residues with the potential to interact with i (through either side-chain or backbone - i.e., influencing residues), and their respective local backbone fragments. If such a TERM has a sufficient number of close structural matches in the database, then this definition works well and the two dTERMen versions will both pick this motif (producing the same result). Because near-backbone TERMs can have many segments (e.g., three potential interacting positions would give rise to a four-segment TERM), they may not always be represented well enough in the database to derive confident sequence statistics on the amino-acid preferences at i. In this case, one is forced to consider the effect of the local backbone geometry on position i as an aggregate of effects from sub-motifs, and the two versions deal with this differently. Version 35 attempts to identify large sub-motifs, each consisting of i and as many of the influencing residues as possible (along with local backbones), such that sub-motifs do not overlap and together cover all influencing residues. This takes a considerable amount of database searching, as many trial sub-motifs have to be queried. Version 34 speeds this process up, at the cost of some detail, by considering just one sub-motif that includes the most “important” influencing residues (assessed via our geometric measure of contact degree (Zheng et al., 2015 a)), on the assumption that this motif dominates sequence statistics.
Applying the above procedure to the domain-peptide complexes in this study, with only peptide positions being variable, involved defining (averaged over all templates) ~22 self TERMs, ~220 pair TERMs, and ~44 near-backbone TERMs.
Structural modeling of designs
We used pyRosetta (Chaudhury et al., 2010) (Linux release r53335) to generate structural models for dTERMen-designed sequences emergent from ILP optimization. This was done by performing fixed-backbone side-chain repacking of all residues in the domain-peptide complex (peptide residues taken from the dTERMen-optimized sequence) using the talaris2013 forcefield (Chaudhury et al., 2010) and default parameters in pyRosetta via “standard_packer_task” and “PackRotamersMover” objects. For residues where there was evidence of crowding, all backbone-dependent rotamers of a residue of interest were manually inspected using PyMol. Fig. S4 was made by choosing the least clashing rotamer.
Sequence logo generation
We solved for the optimal peptide sequence according to dTERMen for each of the trimmed-peptide structures used in the interaction prediction benchmark (structures are given in Table S2), following the design protocol outlined above. No constraints were imposed during design. From these designs, we constructed a sequence logo using WebLogo (Crooks et al., 2004) that is shown in Fig. S2A. For the five chosen design templates, in addition to solving for the optimal sequence for each by ILP, we also performed Monte Carlo (MC) sampling to generate an ensemble of well-scoring sequences, as a way of better characterizing the sequence space predicted to be favorable by dTERMen. We ran 1000 independent MC trajectories for each template starting with a random sequence. Each trajectory involved 100,000 iterations, at each of which a random mutation was evaluated for acceptance according to the Metropolis criterion. The sampling was performed at constant temperature with kT equal to 1 (this was also the temperature used to derive dTERMen statistical energies). The final accepted sequence from each of the 1000 trajectories was used to build an MSA for each template and to generate the logos in Fig. 2A. No constraint was imposed at position 3b.
Designed-peptide property prediction
Predicted helical content for designed peptides was obtained from the AGADIR web server (Muñoz and Serrano, 1997). Predicted net charges and hydrophobicity were obtained using the HelixQuest server (Gautier et al., 2008).
Compilation of Bcl-2 complex structures
Uniprot sequences for human Bcl-xL, Bfl-1 and Mcl-1 were retrieved from Uniprot (UniProt Consortium, 2018) and blasted against the PDB database (Berman, 2000) (7 Nov 2017). Matched structures were downloaded and standardized by transforming selenomethionine to methionine and removing hydrogens and atoms designated as HETATOM. Sequences were aligned and renumbered based on their corresponding Uniprot template sequence using Needle (Needleman and Wunsch, 1970). Regions that were not matched or that were poorly aligned with the Uniprot sequence were removed from the structure. Chains of length 20-39 residues with more than 30% of their Voronoi surface in contact with the Bcl-2 proteins were identified as interacting peptide (McConkey et al., 2003). Unless specified, peptides containing non-natural amino acids were removed from the dataset. Only the first model in deposited NMR ensembles was retained. If a structure included multiple complexes in the asymmetric unit, these were split into new files and analyzed separately.
Comparing peptide binding geometry
For every complex, residues within 8 Å of any peptide atom were considered part of the binding interface and all complexes were structurally aligned using only their binding interface Cα atoms, using 3DCOMB (Wang et al., 2011). To automatically define a common reference residue for all bound peptides, we used a graph-based procedure. Each peptide Cα in the set of superimposed binding interfaces was represented as a node, and an edge was created if the distance between 2 nodes was below a threshold. The distance threshold was initially set at 2 Å and gradually increased by 0.1 Å until the largest clique in the graph included one residue from each complex. This clique represented a set of Cα atoms - one in each structure - all within a distance threshold. Residues in this largest clique were arbitrarily given peptide residue number 100; this reference residue corresponds to residue 95 in structure 3FDL. Using this registry, peptides were trimmed to generate a 20-residue long segment chosen by structural inspection to include positions that make extensive contacts with the protein and that are unlikely to be influenced by crystal contacts in the templates used for modeling. This region corresponds to peptide positions 86 to 105 in structure 3FDL. Structures without a complete 20-mer peptide were not used. Binding interfaces were redefined using trimmed peptides, by taking all peptide atoms plus protein residues within 8 Å of any peptide atom.
Scoring protein-peptide interactions
Structural scoring functions dTERMen (described above), FoldX4.0 and Rosetta were tested for their ability to predict peptide-protein binding affinity using binding data obtained using the SORTCERY protocol (Alford et al., 2017; Lewis and Kuhlman, 2011; Reich et al., 2015; Schymkowitz et al., 2005). Scoring was based on trimmed-peptide structures. Each structure was used as a template input for dTERMen, leading to a scoring function for that template, i.e. a function that can score any peptide binding to the target protein in the template-structure binding mode. FoldX4.0 was used to predict binding affinity by first using FoldX4.0’s “repair” function. Then, for each peptide in the SORTCERY dataset, the repaired template was transformed using the “mutate” function to generate the sequence of the peptide query and scored using the “complex” function. For Rosetta scoring, complex structures generated by FoldX were relaxed with Rosetta (Nov 2017 version rosetta_bin_linux_2017.08.59291, “relax” command) using Talaris2014 or BetaNov force fields (Alford et al., 2017). The default parameters of 5 minimization cycles consisting of 4 rounds of repacking were used for the relaxation protocol. Relaxed structures were run through the Rosetta InterfaceAnalyzer module, and the “dG_separated” values were used as the predicted binding energy. This score describes the difference in Rosetta energy of interface residues between the complex structure and corresponding separated chains. For the sake of simplicity in the reporting of benchmarking results, only the latest scoring function of Rosetta (BetaNov) and dTERMen (35) are discussed. dTERMen scoring function 34 and Rosetta Talaris2014 force field yield similar benchmark performance as these newer versions and values can be found Table S2.
Interaction prediction benchmark
The predictive power of the different structural scoring functions and protocols was assessed by three metrics. First, each method’s ability to discriminate the top 20% tightest-binding peptides from the 20% weakest binders was assessed by calculating the Area Under the Curve (AUC) of the Receiver operating characteristic (ROC) curve. Next, precision was evaluated by calculating the correlation between the binding energy determined by SORTCERY, in kcal/mol, and each method’s predicted binding energy (in arbitrary units). Finally, we computed the percentage of the top 10% of binders from amped SORTCERY experiments that were found in the top 10% of predicted binders. Multiple templates were tested for each protein, and predictive power was evaluated for each template individually. The average performance and standard deviation of performance over all templates was computed and represents the expected value if a random template is chosen. We also assessed prediction performance using the template that gave the lowest energy for each sequence.
Bcl-2 paralog residue conservation
The Bfl-1 sequence was aligned with the sequences of Bcl-xL, Mcl-1, Bcl-2, and Bcl-w using ClustalW (Larkin et al., 2007). Each residue in Bfl-1 was scored for sequence similarity to the corresponding residue in each of the other proteins using the Blosum62 matrix (Henikoff and Henikoff, 1992). Substitutions with scores ≥ 0 were considered similar. To display amino-acid conservation at each position on the Bfl-1 structure, as shown in Fig. S8, each residue was colored by the number of proteins with amino acids similar to the one in Bfl-1 at that position.
Protein and peptide purification
Myc-tagged human Mcl-1 (residues 172-327), Bfl-1 (residues 1-151), Bcl-2 (residues 1-217), Bcl-w (residues 1-164), and Bcl-xL (residues 1-209) were used for binding assays. Untagged Bfl-1 (residues 1-151) and Mcl-1 (residues 172-327) were used for crystallography. The proteins used in this study were purified as previously described (Dutta et al., 2010) and frozen at −80 °C. The peptides used for crystallography were synthesized at the MIT biopolymers facility with N-terminal acetylation and C-terminal amidation and were purified by HPLC on a C18 column with a linear gradient of acetonitrile and water. Purified peptides were lyophilized and resuspended in DMSO. Peptide masses were confirmed by MALDI-TOF mass spectrometric analysis.
Yeast clones
EBY100 yeast cells were transformed using the Frozen-EZ Yeast Transformation II Kit (Zymo Research) according to the manufacturer’s protocol. For a plasmid backbone, we used the PUMA PCT plasmid (Jenson et al., 2017) and digested it with XhoI (NEB) and NheI-HF (NEB) according to the manufacturer's protocol. The inserts were constructed by PCR by mixing the two extension primers with the sense and antisense primers (Table S7) in a 5:5:1:1 ratio to create a sequence that encodes the desired peptide flanked with at least 40 bp of the plasmid sequence on either side of the insertion site to facilitate homologous recombination. The inserts and plasmid backbones were mixed at a 5 to 1 ratio for transformation. The transformation mixture was spread onto SD + CAA plates (5 g/L casamino acids, 1.7 g/L yeast nitrogen base, 5 g/L ammonium sulfate, 10.2 g/L Na2HPO4-7H2O and 8.6 g/L NaH2PO4-H2O, 2% glucose, 15-18 g/L agar, 182 g/L sorbitol) and grown at 30 °C for 2 to 3 days. To confirm each strain, colony PCR followed by sequencing was performed on single colonies. Sequence-verified colonies were grown overnight in SD + CAA (5 g/L casamino acids, 1.7 g/L yeast nitrogen base, 5 g/L ammonium sulfate, 10.2 g/L Na2HPO4-7H2O and 8.6 g/L NaH2PO4-H2O, 2% glucose). The saturated overnight cultures were diluted with 40% glycerol to a final glycerol concentration of 15% and stored at −80 °C.
Yeast growth and FACS analysis
A small amount of frozen culture was scraped from the top of frozen culture stocks to inoculate SD + CAA. After passaging overnight at 30 °C, cultures were diluted to an OD600 of 0.005-0.01 in SD + CAA and grown to an OD600 of 0.1–0.6. Cell cultures were then diluted 25-fold with SG + CAA (5 g/L casamino acids, 1.7 g/L yeast nitrogen base, 5.0 g/L ammonium sulfate, 10.2 g/L Na2HPO4-7H2O and 8.6 g/L NaH2PO4-H2O, 2% galactose) to induce peptide expression and grown for 20-24 hr at 30 °C. To measure binding to surface-displayed peptides, cells were filtered with a 96-well plate filter (105-106 cells/well), washed twice with 150 μL BSS (50 mM Tris pH 8, 100 mM NaCl, 1 mg/ml BSA), and resuspended in BSS with least 10-fold molar excess target protein and incubated in the filter plate for 2 h at room temperature with gentle shaking for equilibration. Binding of the designs to the five Bcl-2 family proteins was measured at 1000 nM, 100 nM, 10 nM, and 1 nM target protein. To detect cell surface expression and binding of target protein, cell suspensions were filtered, washed twice in chilled BSS, resuspended in a 35 μL of 1:100 dilution of primary antibodies (mouse anti-HA, Roche, RRID:AB_514505 and rabbit anti-c-myc antibodies, Sigma, RRID:AB_439680) in BSS and with gentle shaking for 15 min at 4 °C. Cells were then filtered, washed twice in 150 μL chilled BSS, resuspended in 35 μL of a solution of secondary antibodies in BSS (1:40 dilution of APC rat anti-mouse, BD, RRID:AB_398465 and 1:100 dilution of PE goat anti-rabbit, Sigma, RRID:AB_261257) and incubated with gentle shaking in the dark for 15 min at 4 °C. Cells were filtered and washed twice more in 150 μL chilled BSS to remove unbound antibodies. Labeled cells were resuspended in BSS and analyzed using a BD FACSCanto with FACSDiva software. The median binding signals of expression-positive cells are given in Table S6.
Crystallography
Crystals of Bfl-1 in complex with the designed peptides were grown in hanging drops. To set the drops, untagged Bfl-1 (8 mg/mL in 20 mM Tris, 150 mM NaCl, 1% glycerol, 1 mM DTT, pH 8.0) was mixed in equal molar ratio with the designed peptides. 1.5 μL of the Bfl-1/peptide mixture was pipetted onto a glass coverslip and mixed with 1.5 μL of well solution (1.8 - 2.0 M NH4SO4, 50 mM MES pH 6.5). To cryoprotect the crystals, they were transferred into a solution of 2.0 M LiSO4 with 10% glycerol. Crystals were flash frozen in liquid nitrogen. Diffraction data were collected at the Advanced Photon Source at the Argonne National Laboratory, NE-CAT beamline 24-ID-C. The datasets were refined to 1.59 Å and 1.75 Å and scaled using HKL2000 (Otwinowski and Minor, 1997). Phenix was used to phase with the Bfl-1 chain from PDB ID 5UUK (Jenson et al., 2017; McCoy et al., 2007). The peptides were modeled into the difference densities using Coot (Emsley et al., 2010). Iterative rounds of refinement and model building were performed using Phenix and Coot (Emsley et al., 2010; McCoy et al., 2007).
Crystals of Mc1-1 in complex with the designed peptides were grown in hanging drops. To set the drops, TCEP (100 mM) and ZnSO4 (50 mM) was added at 10% volume to untagged Mcl-1 (8.5 mg/mL in 20 mM Tris, 150 mM NaCl, 1% glycerol, 1 mM DTT, pH 8.0) before adding equal molar amounts of the designed peptides. To grow crystals of Mcl-1 in complex with dF1, 1.5 μL of the peptide protein mixture was mixed with 1.5 μL of well solution (25% PEG 3350, 50 mM BIS-Tris pH 8.5, 50 mM NH4CH3CO2). Crystals were cryoprotected by adding 3 μL of a solution of 37.5% glucose in 25% PEG 3350, 50 mM BIS-Tris pH 8.5, 50 mM NH4CH3CO2 directly to the drop 0.5 μL at a time. To grow crystals of Mcl-1 in complex with dF7, 2.5 μL of the peptide protein mixture was mixed with 0.5 μL of well solution (1.4 M sodium citrate pH 6.5, 0.1 M HEPES pH 7.5). For cryoprotection, crystals were transferred to 1.6 M sodium citrate pH 6.5, 0.1 M HEPES pH 7.5. Crystals were flash frozen in liquid nitrogen. Diffraction data were collected at the MIT x-ray core facility. The datasets were refined to 1.95 Å and 2.25 Å and scaled using HKL2000 (Otwinowski and Minor, 1997). Phenix was used to phase with the Mcl-1 chain from PDB ID 3PK1 (Czabotar et al., 2011; McCoy et al., 2007). The peptides were modeled into the difference densities using Coot (Emsley et al., 2010). Iterative rounds of refinement and model building were performed using Phenix and Coot (Emsley et al., 2010; McCoy et al., 2007).
QUANTIFICATION AND STATISTICAL ANALYSIS
Calculation of AUC and correlation values was done using Python libraries scikit-learn and SciPy. Table footnotes describe the quantities (standard deviations) used to report variance.
DATA AND SOFTWARE AVAILABILITY
To access and use dTERMen see grigoryanlab.org/dtermen. Scripts used for the prediction benchmark, protein structure files, predicted energy values, and experimental data can be downloaded from this GitHub repository: https://github.com/KeatingLab/dTERMen_design under the MIT License. The atomic coordinates of the reported structures have been deposited in the Protein Data Bank with accession codes 6MBB, 6MBC, 6MBD, and 6MBE.
Supplementary Material
Highlights.
Information in tertiary structural motifs is used to score protein complexes
Bcl-2 protein interactions are predicted and designed without atomistic modeling
Motif-based design generates high-affinity peptides quickly and reliably
High-resolution structures show how designs engage Bfl-1 and Mcl
Acknowledgements
This project was supported by NIGMS award R01 GM110048 to A. Keating and supported by NIH award P20-GM113132 and NSF award DMR1534246 to G. Grigoryan. J. Jenson was partially supported by a fellowship from the Koch Institute for Integrative Cancer Research and V. Frappier was supported by NSERC and FRQNT postdoctoral funding. Part of this work was conducted at the Northeastern Collaborative Access Team beamlines, which are funded by the National Institute of General Medical Sciences from the National Institutes of Health (P30 GM124165). The Pilatus 6M detector on 24-ID-C beam line is funded by a NIH-ORIP HEI grant (S10 RR029205). This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the U.S. Department of Energy.
We thank the Koch Institute Flow Cytometry Core Facility for assistance with FACS sorting, the MIT Structural Biology Core Facility and R. Grant for assistance with X-ray crystallography, and members of the Drennan laboratory for help with X-ray data collection. V. Xue provided FACS data-processing scripts.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration of Interests
MIT and Dartmouth have filed a provisional patent application covering sequences disclosed here. Dartmouth College has filed a provisional patent to cover the design method behind dTERMen. GG is an employee and a shareholder in an early stage biotechnology startup that may use the technology; VF is a consultant for this venture.
References
- Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, Shapovalov MV, Renfrew PD, Mulligan VK, Kappel K, et al. (2017). The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput 13, 3031–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arkadash V, Yosef G, Shirian J, Cohen I, Horev Y, Grossman M, Sagi I, Radisky ES, Shifman JM, and Papo N (2017). Development of High Affinity and High Specificity Inhibitors of Matrix Metalloproteinase 14 through Computational Design and Directed Evolution. J. Biol. Chem 292, 3481–3495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berger S, Procko E, Margineantu D, Lee EF, Shen BW, Zelter A, Silva D-AA, Chawla K, Herold MJ, Garnier J-MM, et al. (2016). Computationally designed high specificity inhibitors delineate the roles of BCL2 family proteins in cancer. Elife 5, 1–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman HM (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bird GH, Mazzola E, Opoku-Nsiah K, Lammert MA, Godes M, Neuberg DS, and Walensky LD (2016). Biophysical determinants for cellular uptake of hydrocarbon-stapled peptide helices. Nat. Chem. Biol 12, 845–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cang S, Iragavarapu C, Savooji J, Song Y, and Liu D (2015). ABT-199 (venetoclax) and BCL-2 inhibitors in clinical development. J. Hematol. Oncol 8, 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chames P, Van Regenmortel M, Weiss E, and Baty D (2009). Therapeutic antibodies: successes, limitations and hopes for the future. Br. J. Pharmacol 157, 220–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaudhury S, Lyskov S, and Gray JJ (2010). PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crooks GE, Hon G, Chandonia J-M, and Brenner SE (2004). WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Czabotar PE, Lee EF, Thompson GV, Wardak AZ, Fairlie WD, and Colman PM (2011). Mutation to Bax beyond the BH3 domain disrupts interactions with pro-survival proteins and promotes apoptosis. J. Biol. Chem. 286, 7123–7131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey JA, and Chica RA (2014). Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles. Proteins Struct. Funct. Bioinforma 82, 771–784. [DOI] [PubMed] [Google Scholar]
- Debartolo J, Dutta S, Reich L, and Keating AE (2012). Predictive Bcl-2 family binding models rooted in experiment or structure. J. Mol. Biol 422, 124–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeBartolo J, Taipale M, and Keating AE (2014). Genome-Wide Prediction and Validation of Peptides That Bind Human Prosurvival Bcl-2 Proteins. PLOS Comput. Biol 10, e1003693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dutta S, Gullá S, Chen TS, Fire E, Grant RA, and Keating AE (2010). Determinants of BH3 binding specificity for Mcl-1 versus Bcl-xL. J. Mol. Biol 398, 747–762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dutta S, Chen TS, and Keating AE (2013). Peptide ligands for pro-survival protein Bfl-1 from computationally guided library screening. ACS Chem. Biol. 8, 778–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emsley P, Lohkamp B, Scott WG, and Cowtan K (2010). Features and development of Coot. Acta Crystallogr. D. Biol. Crystallogr 66, 486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng X, and Barth P (2016). A topological and conformational stability alphabet for multipass membrane proteins. Nat. Chem. Biol 12, 167–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernandez-Fuentes N, Oliva B, and Fiser A (2006). A supersecondary structure library and search algorithm for modeling loops in protein structures. Nucleic Acids Res. 34, 2085–2097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fire E, Gullá SV, Grant RA, and Keating AE (2010). Mcl-1-Bim complexes accommodate surprising point mutations via minor structural changes. Protein Sci. 19, 507–519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foight GW, Ryan JA, Gullá SV, Letai A, and Keating AE (2014). Designed BH3 peptides with high affinity and specificity for targeting Mcl-1 in cells. ACS Chem. Biol 9, 1962–1968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frappier V, Duran M, and Keating AE (2018). PixelDB: Protein-peptide complexes annotated with structural conservation of the peptide binding mode. Protein Sci. 27, 276–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gai SA, and Wittrup KD (2007). Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol 17, 467–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gautier R, Douguet D, Antonny B, and Drin G (2008). HELIQUEST: a web server to screen sequences with specific alpha-helical properties. Bioinformatics 24, 2101–2102. [DOI] [PubMed] [Google Scholar]
- Grigoryan G, Reinke AW, and Keating AE (2009). Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 458, 859–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henikoff S, and Henikoff JG (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U. S. A. 89, 10915–10919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiraki M, Maeda T, Mehrotra N, Jin C, Alam M, Bouillez A, Hata T, Tagde A, Keating A, Kharbanda S, et al. (2018). Targeting MUC1-C suppresses BCL2A1 in triple-negative breast cancer. Signal Transduct. Target. Ther 3, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacobs TM, Williams B, Williams T, Xu X, Eletsky A, Federizon JF, Szyperski T, and Kuhlman B (2016). Design of structurally distinct proteins using strategies inspired by evolution. Science 352, 687–690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenson JM, Ryan JA, Grant RA, Letai A, and Keating AE (2017). Epistatic mutations in PUMA BH3 drive an alternate binding mode to potently and selectively inhibit anti-apoptotic Bfl-1. Elife 6, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenson JM, Xue V, Stretz L, Mandal T, Reich L. “Luther,” and Keating AE (2018). Peptide design by optimization on a data-parameterized protein interaction landscape. Proc. Natl. Acad. Sci 201812939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kingsford CL, Chazelle B, and Singh M (2005). Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21, 1028–1039. [DOI] [PubMed] [Google Scholar]
- Kotschy A, Szlavik Z, Murray J, Davidson J, Maragno AL, Le Toumelin-Braizat G, Chanrion M, Kelly GL, Gong J-N, Moujalled DM, et al. (2016). The MCL1 inhibitor S63845 is tolerable and effective in diverse cancer models. Nature 538, 477–482. [DOI] [PubMed] [Google Scholar]
- Kumar M, Gupta D, Singh G, Sharma S, Bhat M, Prashant CK, Dinda AK, Kharbanda S, Kufe D, and Singh H (2014). Novel polymeric nanoparticles for intracellular delivery of peptide Cargos: antitumor efficacy of the BCL-2 conversion peptide NuBCP-9. Cancer Res. 74, 3271–3281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. [DOI] [PubMed] [Google Scholar]
- Lee EF, Fedorova A, Zobel K, Boyle MJ, Yang H, Perugini MA, Colman PM, Huang HCS, Deshayes K, and Fairlie WD (2009). Novel Bcl-2 homology-3 domain-like sequences identified from screening randomized peptide libraries for inhibitors of the pro-survival Bcl-2 proteins. J. Biol. Chem. 284, 31315–31326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis SM, and Kuhlman BA (2011). Anchored design of protein-protein interfaces. PLoS One 6, e20872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mackenzie CO, and Grigoryan G (2017). Protein structural motifs in prediction and design. Curr. Opin. Struct. Biol 44, 161–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mackenzie CO, Zhou J, and Grigoryan G (2016). Tertiary alphabet for the observable protein structural universe. Proc. Natl. Acad. Sci. U. S. A. 113, E7438–E7447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McConkey BJ, Sobolev V, and Edelman M (2003). Discrimination of native protein structures using atom-atom contact scoring. Proc. Natl. Acad. Sci. U. S. A. 100, 3215–3220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, and Read RJ (2007). Phaser crystallographic software. J. Appl. Crystallogr 40, 658–674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miles JA, Yeo DJ, Rowell P, Rodriguez-Marin S, Pask CM, Warriner SL, Edwards TA, and Wilson AJ (2016). Hydrocarbon constrained peptides - understanding preorganisation and binding affinity. Chem. Sci 7, 3694–3702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muñoz V, and Serrano L (1997). Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms. Biopolymers 41, 495–509. [DOI] [PubMed] [Google Scholar]
- Needleman SB, and Wunsch CD (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol 48, 443–453. [DOI] [PubMed] [Google Scholar]
- Nischan N, Herce HD, Natale F, Bohlke N, Budisa N, Cardoso MC, and Hackenberger CPR (2015). Covalent attachment of cyclic TAT peptides to GFP results in protein delivery into live cells with immediate bioavailability. Angew. Chem. Int. Ed. Engl 54, 1950–1953. [DOI] [PubMed] [Google Scholar]
- Opferman JT (2016). Attacking cancer’s Achilles heel: antagonism of anti-apoptotic BCL-2 family members. FEBS J. 283, 2661–2675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Otwinowski Z, and Minor W (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326. [DOI] [PubMed] [Google Scholar]
- Procko E, Berguig GY, Shen BW, Song Y, Frayo S, Convertine AJ, Margineantu D, Booth G, Correia BE, Cheng Y, et al. (2014). A computationally designed inhibitor of an Epstein-Barr viral Bcl- 2 protein induces apoptosis in infected cells. Cell 157, 1644–1656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian Z, Martyna A, Hard RL, Wang J, Appiah-Kubi G, Coss C, Phelps MA, Rossman JS, and Pei D (2016). Discovery and Mechanism of Highly Efficient Cyclic Cell-Penetrating Peptides. Biochemistry 55, 2601–2612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reich L, Dutta S, and Keating AE (2015). SORTCERY - A High-Throughput Method to Affinity Rank Peptide Ligands. J. Mol. Biol. 427, 2135–2150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rezaei Araghi R, Bird GH, Ryan JA, Jenson JM, Godes M, Pritz JR, Grant RA, Letai A, Walensky LD, and Keating AE (2018). Iterative optimization yields Mcl-1-targeting stapled peptides with selective cytotoxicity to Mcl-1-dependent cancer cells. Proc. Natl. Acad. Sci. U. S. A. 115, E886–E895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roberts KE, Cushing PR, Boisguerin P, Madden DR, and Donald BR (2012). Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLOS Comput. Biol 8, e1002477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, and Serrano L (2005). The FoldX web server: An online force field. Nucleic Acids Res. 33, W382–W388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Souers AJ, Leverson JD, Boghaert ER, Ackler SL, Catron ND, Chen J, Dayton BD, Ding H, Enschede SH, Fairbrother WJ, et al. (2013). ABT-199, a potent and selective BCL-2 inhibitor, achieves antitumor activity while sparing platelets. Nat. Med 19, 202–208. [DOI] [PubMed] [Google Scholar]
- Tompa P, Davey NE, Gibson TJ, and Babu MM (2014). A million peptide motifs for the molecular biologist. Mol. Cell 55, 161–169. [DOI] [PubMed] [Google Scholar]
- UniProt Consortium, T. (2018). UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, and Schymkowitz J (2011). BriX: A database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res. 39, 435–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walensky LD, and Bird GH (2014). Hydrocarbon-Stapled Peptides: Principles, Practice, and Progress. J. Med. Chem 57, 6275–6288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang S, Peng J, and Xu J (2011). Alignment of distantly related protein structures: Algorithm, bound and implications to homology modeling. Bioinformatics 27, 2537–2545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng F, and Grigoryan G (2017). Sequence statistics of tertiary structural motifs reflect protein stability. PLoS One 12, 1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng F, Zhang J, and Grigoryan G (2015a). Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships. Structure 23, 961–971. [DOI] [PubMed] [Google Scholar]
- Zheng F, Jewell H, Fitzpatrick J, Zhang J, Mierke DF, and Grigoryan G (2015b). Computational design of selective peptides to discriminate between similar PDZ domains in an oncogenic pathway. J. Mol. Biol 427, 491–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou J, Panaitiu AE, and Grigoryan G (2018). A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. BioRxiv. doi: 10.1101/431635 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
To access and use dTERMen see grigoryanlab.org/dtermen. Scripts used for the prediction benchmark, protein structure files, predicted energy values, and experimental data can be downloaded from this GitHub repository: https://github.com/KeatingLab/dTERMen_design under the MIT License. The atomic coordinates of the reported structures have been deposited in the Protein Data Bank with accession codes 6MBB, 6MBC, 6MBD, and 6MBE.