The nature of crystallography of biological macromolecules can be unpredictable, and protein crystals do not consistently produce high-resolution diffraction data. In the study described here, it was sought to develop a computational design approach that could identify resolution-enhancing point mutations at crystallographic protein–protein interfaces.
Keywords: protein crystallography, protein design, Rosetta, staphylococcal nuclease
Abstract
Substantial advances have been made in the computational design of protein interfaces over the last 20 years. However, the interfaces targeted by design have typically been stable and high-affinity. Here, we report the development of a generic computational design method to stabilize the weak interactions at crystallographic interfaces. Initially, we analyzed structures reported in the Protein Data Bank to determine whether crystals with more stable interfaces result in higher resolution structures. We found that for 22 variants of a single protein crystallized by a single individual, the Rosetta-calculated ‘crystal score’ correlates with the reported diffraction resolution. We next developed and tested a computational design protocol, seeking to identify point mutations that would improve resolution in a highly stable variant of staphylococcal nuclease (SNase). Using a protocol based on fixed protein backbones, only one of the 11 initial designs crystallized, indicating modeling inaccuracies and forcing us to re-evaluate our strategy. To compensate for slight changes in the local backbone and side-chain environment, we subsequently designed on an ensemble of minimally perturbed protein backbones. Using this strategy, four of the seven designed proteins crystallized. By collecting diffraction data from multiple crystals per design and solving crystal structures, we found that the designed crystals improved the resolution modestly and in unpredictable ways, including altering the crystal space group. Post hoc, in silico analysis of the three observed space groups for SNase showed that the native space group was the lowest scoring for four of six variants (including the wild type), but that resolution did not correlate with crystal score, as it did in the preliminary results. Collectively, our results show that calculated crystal scores can correlate with reported resolution, but that the correlation is absent when the problem is inverted. This outcome suggests that more comprehensive modeling of the crystallographic state is necessary to design high-resolution protein crystals from poorly diffracting crystals.
1. Introduction
X-ray crystallography is still the primary method for acquiring atomic resolution structural information about biological macromolecules such as proteins, and it is indispensable for gaining functional and mechanistic insights across biological and pharmacological disciplines (Giegé, 2013 ▸). However, because of its highly unpredictable nature, crystallography is viewed more as art than as science, a fact reflected by the low rate of success in large-scale protein-crystallization efforts (∼10–20%; Fusco et al., 2014 ▸; Price et al., 2009 ▸). Even when proteins produce diffraction-quality crystals, the diffraction patterns may be of low quality or difficult to solve. Approximately 23% of the crystal structures reported in the PDB have a reported resolution of 2.5 Å or worse (Berman et al., 2000 ▸). At a resolution of 2.5 Å, the backbone, side chains and small molecules can be fitted to the electron density with a reasonable degree of precision; however, key features such as the placement of water molecules or alternate side-chain conformations may be less certain. At even lower resolutions (3–6 Å), ligands/side chains and even the main chain may not be fitted reliably (Richardson, 1981 ▸; Wlodawer et al., 2008 ▸; Knight et al., 1989 ▸; Rupp & Segelke, 2001 ▸). An inability to resolve ligands, water molecules, small molecules or side-chain interactions prevents the accurate understanding of catalytic mechanisms, drug–protein interactions or the organization of certain macromolecular complexes, and precludes computational design from using natural proteins as input.
Historically, rational design has been used to overcome various degrees of protein recalcitrance to crystallization, from improving existing crystals to generating new ones (Dale et al., 2003 ▸). The variety in strategies has been quite broad. Some strategies can be applied when only the protein sequence is known, even before crystal trays have been set up; for example, by deleting loops or regions of low sequence complexity (Evdokimov et al., 2008 ▸) or by identifying stabilizing mutations from homologous sequences (Lawson et al., 1991 ▸), surface-entropy reduction (SER; Cooper et al., 2007 ▸) or de novo crystal design (Lanci et al., 2012 ▸). Other strategies have focused on improving an existing crystal, such as through the rational engineering of crystal contacts (Oubridge et al., 1995 ▸; Lai et al., 2019 ▸). Of the above strategies, all but SER must be tailored to a specific target protein. The necessity for protein-specific approaches is somewhat surprising considering that the underlying physics of crystallization is universal. For example, Fusco and coworkers identified two generic mechanisms underlying crystal formation in their analysis of 182 proteins in 1536 crystallization conditions (Fusco et al., 2013 ▸). In principle, a reliable and general method for enhancing the resolution of poorly diffracting crystals through rational and computational design should exist.
Here, we report our attempts to develop resolution-enhancing computational design of protein–protein interactions at crystallographic interfaces. We began by identifying the physical determinants of high-resolution protein crystals. This led us to identify a positive correlation between reported resolution and crystal lattice ‘stability’ (the Rosetta-determined score of the asymmetric unit and the unique protein–protein interactions defining the crystal lattice). A Rosetta protocol was then developed to identify stabilizing (resolution-enhancing) point mutations. This protocol was benchmarked in silico against rationally engineered protein crystals (Mizutani et al., 2008 ▸). We tested our protocol experimentally by designing, cloning, expressing, purifying and crystallizing variants of a highly stable form of staphylococcal nuclease (SNase). We found that variants designed on a single, fixed backbone rarely crystallized (1/11), whereas variants designed on an ensemble of backbones crystallized more readily (4/7). Comparison of the highest resolution shells for collected diffraction data (determined by CC1/2) revealed only minor improvements in resolution (∼0.05 Å) for three of the five designs that crystallized. Surprisingly, two of the resolution-enhancing designs altered the crystal space group. An analysis of our efforts shows that space-group changes could have been predicted for three of the five designs, but that calculated crystal lattice stability does not correlate with resolution for our test protein.
2. Materials and methods
All data associated with this paper are available online via Zenodo at the following DOIs: https://doi.org/10.5281/zenodo.3216968, https://doi.org/10.5281/zenodo.3222946, https://doi.org/10.5281/zenodo.3228344, https://doi.org/10.5281/zenodo.3228838, https://doi.org/10.5281/zenodo.3235486, https://doi.org/10.5281/zenodo.3235518 and https://doi.org/10.5281/zenodo.3228351.
2.1. Curation of crystal data sets
Three protein structure data sets were constructed on October 10th 2016 from the Protein Data Bank (PDB; Berman et al., 2000 ▸), termed the ‘PDB representative’, ‘SNase’ and ‘Mizutani’ sets.
To generate the PDB representative set, we first generated three lists of PDB IDs and then took the intersection of the lists. The first list ensured that we only analyzed nonredundant, reasonable-quality protein structures. Using the PISCES server (Wang & Dunbrack, 2003 ▸), we generated a list of non-Cα-only X-ray structures in the PDB adhering to the following criteria (culling by chain): (i) 25% maximum sequence identity, (ii) reported resolution better than 3.0 Å, (iii) R value < 0.3 and (iv) proteins comprising more than 40 but less than 10 000 residues.
This list, pisces.txt, contained 10 886 PDB IDs. Next, we generated a second list to limit our analysis to solely crystallographic protein–protein interactions. Using the advanced search option on the PDB website (https://www.rcsb.org/pdb/search/advSearch.do), we generated a list of PDB structures with monomeric stoichiometry and only a single chain in both the biological and asymmetric units. This list, pdb.txt, contained 36 899 PDB IDs. Finally, we generated a third list to exclude structures containing many ligands or nonprotein atoms. Starting with the PDB IDs from the pisces.txt list, we used a Python script (1-parse-pdb-remarks.py) to filter PDB IDs, selecting for the absence of REMARK 465/470/475/480 records, which indicate missing atoms, and a fraction of non-HOH HETATM records greater than 0.1. This list, missing_or_nonhet.txt, contained 873 PDB IDs. We took the intersection of our three lists of PDB IDs as our PDB representative set; this was performed in R using the merge-three-lists.R script. The final list contained 379 PDB IDs.
Separately, to generate the SNase set, we used a new PDB advanced search to select for X-ray structures with UniProt Accession ID P00644 (the identifier for SNase), in addition to the above criteria for stoichiometry, biological unit and asymmetric unit, but not culling for sequence identity, R value < 0.3, the absence of atoms or the presence of ligands. The SNase set contained 256 PDB IDs which were not present in the PDB representative set.
Finally, the Mizutani set was simply composed of the 21 structures of diphthine synthase deposited by Mizutani and coworkers in their study of rational crystal contact engineering (Mizutani et al., 2008 ▸). These structures were not present in either the PDB representative or the SNase set.
2.2. Modeling of crystals
In this study, we sought to computationally quantify crystallographic protein–protein interactions. To this end, proteins were modeled in three states: (i) as a crystal, including all symmetry mates within 12 Å, (ii) as a collection of pairwise interfaces and (iii) as a monomer. These states were constructed for each data set. Furthermore, pairwise interfaces were analyzed with the Evolutionary Protein–Protein Interface Classifier (EPPIC; Bliven et al., 2018 ▸) to verify that the interactions we were assessing were crystallographic and not biological.
2.2.1. PDB representative
Monomers were downloaded from the PDB and energy-minimized using Rosetta, weekly version v.2018.24. To ensure accurate energy calculation and because Rosetta cannot model all ligands/cofactors etc., HETATM records were omitted. Energy minimization was performed by the FastRelax protocol (Conway et al., 2014 ▸; Nivón et al., 2013 ▸) with the command line
| relax.linuxgccrelease -l list.txt -relax:constrain_relax_to_start_coords -relax:ramp_constraints -ex1 -ex2 -use_input_sc -flip_NHQ -no_optH false -nstruct 10 -out:pdb_gz, |
where list.txt contains the PDB IDs. Individual crystallographic interfaces were generated and analyzed using the pre-compiled EPPIC command-line interface (v.3.0.5),
| epicc -i 1ABC.pdb -l -p -s, |
where 1ABC.pdb is any PDB file. Any PDB ID with an interface predicted by EPPIC to be biological and not crystallographic was excluded from further analysis if the corresponding PDB entry or supporting literature indicated that the biologically relevant state was not monomeric (as we only wish to study crystallographic interactions). Crystals were modeled using the same protocol, except that this time the -symmetry:symmetry_definition CRYST1 flag was included to enable modeling of the asymmetric unit and all symmetry mates within 12 Å as described previously (DiMaio, Leaver-Fay et al., 2011 ▸).
The energy of crystallization, or ‘crystal score’, was computationally determined using the weekly PyRosetta (Chaudhury et al., 2010 ▸) release v.2018.24 and the November 2016 version of the Rosetta scoring function (Alford et al., 2017 ▸; Park et al., 2016 ▸). The script score_crystal_interfaces_parallel.py evaluated the energy of each crystallographic interface, which was later combined with the energy of the monomer to yield the crystal score (see Section 3.1). After excluding four structures that could not be modeled with our approach and 12 structures that accidentally included biological interactions in the crystal, 364 PDB entries were analyzed from the PDB. These PDBs are listed in pdb-representative-rosetta.txt.
2.2.2. SNase
As above, SNase monomers were downloaded from the PDB and energy-minimized using Rosetta v.2018.24-dev60250. Unlike above, HETATM records were retained because the nucleotide analog thymidine-3′,5′-diphosphate (THP) bound by the enzyme makes important crystal contacts. The ligand geometry was fixed in simulations and read from the PDB Chemical Components Dictionary. Energy minimization was performed as described above. Individual crystallographic interfaces were not generated and analyzed using EPPIC because SNase is known to be a monomer. Crystals were modeled and crystal scores were computed as described above.
2.2.3. Mizutani
Modeling of the 21 diphthine synthase structures of Mizutani and coworkers was performed similarly to the modeling of the PDB representative and SNase crystal structures, as described above. The only exception was that diphthine synthase is naturally a dimer, so during energy evaluation the two dimeric chains were treated as a monomer and only crystallographic, and not biological, interfaces were considered. As with SNase, because the biological state is known, EPPIC was not used to generate or analyze crystallographic interfaces. Crystals were modeled and crystal scores were computed as described above.
2.3. Forward design of diphthine synthase
To determine the computational design approach that was most likely to be experimentally successful, we tested several strategies on diphthine synthase. Since the resolution is reported for 22 variants, we can ask whether a particular design strategy can predict resolution-improving variants. This approach is known as forward design. The energy-minimized structure of wild-type diphthine synthase (PDB entry 1wng; Kishishita et al., 2008 ▸) was used as input for design. At each of the positions mutated by Mizutani and coworkers (26, 49, 54, 65, 69, 79, 140, 142, 146, 171, 173, 187 and 261), each amino acid except cysteine or proline was tested. We attempted to accommodate the mutation by permitting varying degrees of freedom in the wild-type crystal form (simulated by Rosetta Symmetry). These different design strategies ranged from only permitting neighboring side chains to repack to redocking in the crystal lattice (see Section 3). All energy calculations were performed in the crystal form and averaged over ten repeats.
2.4. Computational design of SNase
The design of SNase variants initially followed our most successful diphthine synthase approach: introducing a point mutation at a position followed by repacking of side chains and by energy minimization of side-chain and backbone dihedral angles. An energy-minimized structure of Δ+PHS SNase (PDB entry 3bdc; Castañeda et al., 2009 ▸) was used as input. The geometry of the THP ligand was held fixed during the simulation and defined by a Rosetta params file derived from the PDB coordinates. Later, we introduced an ensemble of 200 perturbed backbones generated by Rosetta Backrub (Smith & Kortemme, 2008 ▸), as it has been shown that interface ΔΔG prediction is more accurate when using an ensemble of backbones rather than just a single input (Barlow et al., 2018 ▸). The following steps were repeated for each member of the backbone ensemble and for every designable surface position, defined as residues having a Cα–Cα distance of less than 8 Å across any crystallographic interface. Firstly, the position was mutated to one of 18 amino acids (cysteine and proline were excluded). The crystal form was then generated using Rosetta Symmetry in the wild-type space group and unit-cell dimensions. Finally, side chains were repacked to accommodate the mutation in the crystal form. The error in this modeling was calculated across the ensemble of 200 backbones instead of by repeating the simulation ten times.
2.5. Cloning, expression and purification of proteins
Point mutations were introduced into the highly stable Δ+PHS variant (Castañeda et al., 2009 ▸) by QuikChange mutagenesis (Liu & Naismith, 2008 ▸) and the proteins were expressed in Escherichia coli BL21(DE3) cells transformed with the pET-24a(+) vector and purified as described previously (García-Moreno et al., 1997 ▸).
2.6. Protein crystallization
Crystals of Δ+PHS and its variants were grown by the hanging-drop vapor-diffusion method at 277 K. The reservoir solution varied, ranging in pH from 6 to 9, with 20–40%(v/v) 2-methyl-2,4-pentanediol (MPD), either three or two molar equivalents of THP, either two or one molar equivalents of CaCl2 and 25 mM potassium phosphate. The protein concentration varied across variants, but was always mixed in a 1:1 ratio with the reservoir solution to make the drop. Conditions for each crystal are detailed in Supplementary Table S1. Crystals typically appeared after one week, were harvested with Hampton Research CryoLoops on CrystalCap Copper HT magnetic sample mounts and were immediately flash-cooled in liquid nitrogen. Crystals were stored at 77 K until data collection.
2.7. Data collection and structure determination
All X-ray diffraction data were collected from single crystals at 77 K using a Rigaku FR-E SuperBright rotating-anode X-ray generator and a Dectris PILATUS 200K pixel-array detector. Diffraction data were indexed, integrated and scaled using the XDS package (Kabsch, 2010 ▸). R p.i.m. calculations were carried out with SCALA (Evans, 2006 ▸). Phasing, modeling building and model refinement were performed using Phenix (Adams et al., 2010 ▸). Phasing was performed by molecular replacement in Phaser (McCoy et al., 2007 ▸) using PDB entry 3bdc as the search model with solvent-exposed or mutated side chains truncated to the Cα position to avoid biasing side-chain placement at crystal contact sites. Side chains were rebuilt using Coot (Emsley et al., 2010 ▸): initial placement was performed using the Mutate and Autofit function and was followed by manual refinement. Whole-structure refinement and water placement were performed using phenix.refine (Afonine et al., 2012 ▸) and phenix.rosetta_refine (DiMaio, Terwilliger et al., 2011 ▸). Data-collection and refinement statistics are shown in Supplementary Table S2. Crystal structures deposited in the PDB include models 6ok8 (K127L), 6u0w (K133M) and 6u0x (Q123D).
3. Results
Our primary goal in this effort was to develop a broadly applicable, Rosetta-based computational method for crystal contact design, with the goal of systematically predicting single point mutations that could enhance resolution. To this end, we first asked whether or not Rosetta scoring functions might correlate with the reported resolution.
3.1. For specific data sets, the Rosetta score correlates with the reported resolution
With over 129 000 crystal structures of biological macromolecules (Berman et al., 2000 ▸), the PDB provides a trove of data that can be used to determine whether or not the Rosetta score correlates with the reported resolution of protein crystals. To ensure a fair comparison, structures must first be energy-minimized in the Rosetta scoring function using the Rosetta FastRelax protocol (Nivón et al., 2013 ▸; Conway et al., 2014 ▸). As FastRelax runtime scales with protein size, testing every structure in the PDB is not feasible. Furthermore, some structures are overrepresented in the PDB, which might bias the analysis. Instead of analyzing all structures, we selected a diverse and representative subset of the PDB containing 364 structures, which had a maximum sequence identity of 25%, a reported resolution of better than 3 Å, R values of less than 0.3 and featured only crystallographic interactions (fully described in Sections 2.1 and 2.2). For every structure in the set, we generated all symmetry mates within 12 Å using Rosetta Symmetry (DiMaio, Leaver-Fay et al., 2011 ▸) and energy-minimized this ‘crystal’ form ten separate times using the default FastRelax protocol, which features four cycles of minimization each with progressively weaker harmonic constraints to prevent substantial deviation from the starting coordinates. Separately, we energy-minimized the monomeric form of the protein, which was also the asymmetric unit and the biologically relevant unit. We approximated this ‘crystal score’ as E
C =
, where E
m is the average Rosetta score of ten energy-minimized monomers and
is the average Rosetta score of interface i in the crystal form summed over all interfaces within 12 Å of the asymmetric unit. Hence, E
C represents that energy of the minimal unit required to generate the crystal. The relationship between reported resolution and score is shown in Fig. 1 ▸(a).
Figure 1.
The Rosetta-calculated crystal score correlates with the reported resolution when comparing crystals of the same protein varying only by point mutations. The per-monomer crystal scores (i.e. the score of the monomer plus all crystallographic interactions, which is the minimal interacting unit required to generate the crystal) and reported crystal resolutions are compared for three sets of protein structures. The PDB representative set (a) samples 364 monomers with distinct sequences and shows no relationship between score and resolution. The SNase set (b) (excluding an outlier at 2.5 Å resolution) compares only crystals of S. aureus nuclease, attempting to rule out the protein as a variable, but still there is no trend. Finally, the diphthine synthase set (c) compares crystals that vary only by a point mutation, ruling out most extrinsic variables, and here score correlates with resolution.
While we anticipated that lower crystal score would correlate with lower numerical values of resolution (colloquially ‘higher’ resolution), we initially found a slight anticorrelation between reported resolution and score for the PDB representative set: higher resolution structures (lower values) had higher scores. We hypothesized that this unexpected result was caused by our inability to control for the many variables that affect reported resolution that are not captured by the Rosetta score (for example, how the highest resolution shell cutoff was decided, user handling of the crystal, the content of the reservoir solution etc.). To test this hypothesis, we analyzed two additional sets of crystal structures in the same manner as the PDB representative set. The first additional set that we analyzed controlled for the protein as a variable. We searched for a small, globular protein with many structures in the PDB that differed only slightly from each other but that spanned at least 1 Å in resolution. Of the multiple proteins fulfilling these criteria, we selected SNase, which had 256 crystal structures. We used Rosetta Symmetry and FastRelax to generate ten energy-minimized monomers and crystals, and computed the crystal score as described above. The SNase set did not show a strong correlation between reported resolution and score (Fig. 1 ▸ b). To further control extrinsic variables, we analyzed 22 variants of a single protein (diphthine synthase) that had been previously cloned, expressed, purified and crystallized and the crystal structures solved by one scientist in a crystal-engineering study by Mizutani et al. (2008 ▸). Despite the variants only differing by a point mutation or two, the crystal structures spanned a range of reported resolutions from 1.5 to 2.3 Å. The structures were energy-minimized and the crystal scores calculated in a similar fashion as for the previous two sets. The Mizutani set showed a strong correlation (R 2 = 0.8) between the Rosetta score and the reported resolution (Fig. 1 ▸ c).
3.2. Rosetta score can identify the resolution-enhancing mutations of Mizutani and coworkers
Since low score corresponded to high resolution for the Mizutani set, we next sought a fast computational design strategy that could identify resolution-enhancing mutations from the wild-type (WT) crystal structure. We tested six strategies on the Mizutani set in an approach known as forward design. Using the energy-minimized crystal form of the WT protein as input, we introduced point mutations one by one at the positions engineered by Mizutani and coworkers. We then optimized the side-chain dihedral angles while keeping the backbone fixed (side-chain repacking; Leaver-Fay et al., 2008 ▸). Following repacking, we tested six design strategies with varying degrees of freedom: (1) we did nothing else before evaluating the score, (2) we applied gradient-based energy minimization on side-chain dihedral angles, (3) we applied gradient-based energy minimization on side-chain and backbone dihedral angles, (4) we applied gradient-based energy minimization on side-chain dihedral angles and the relative position/orientation of the protein and its symmetry mates, (5) we applied gradient-based energy minimization on side-chain and backbone dihedral angles and the relative position/orientation of the protein and its symmetry mates and (6) we sampled the relative position/orientation of the protein and its symmetry mates, translating in steps of 0.05 Å and rotating in steps of 0.1°, followed by energy minimization as in strategy 5 over four Monte Carlo cycles. All gradient-based minimization was run until convergence was achieved, which was defined as a change in Rosetta score of less than 0.00001 following an iteration of minimization, or for 200 iterations. Each strategy was tested ten times to assess the error. The forward-design results for all strategies are shown in Supplementary Fig. S1.
We found strategy 3 (minimizing on side-chain and backbone torsion angles after repacking) to be the most successful. Fig. 2 ▸ compares the predicted change in score between each variant and WT diphthine synthase for strategy 3. This approach identifies six score-reducing mutations, all of which are reported to enhance resolution, but misses the remaining 11 resolution-enhancing mutations identified in the paper (lower and upper left-hand quadrants of Fig. 2 ▸). This approach also identifies 46 other mutations with a lower score than that of the WT (points not shown in Fig. 2 ▸); however, these latter mutations were not experimentally characterized by Mizutani and coworkers, so it is unclear whether these mutations are resolution-enhancing. In a worst-case scenario, in which the uncharacterized point mutations are not resolution-enhancing, our design approach would identify six resolution-enhancing mutations out of 52 predicted score-reducing mutations, or 11.5%. Both success rates [35.3% (6/17) and 11.5% (6/52)] compare favorably with historical protein-interface design success rates, which are typically less than 10% (Stranges & Kuhlman, 2013 ▸).
Figure 2.
Forward design on diphthine synthase suggests that Rosetta can successfully identify mutations that improve or worsen resolution. The plot shows the difference in score (in the crystal form) between the WT and various designs versus the reported resolution, with the solid lines indicating the WT values. The filled circles are mutations where the score correctly predicts the sign of the resolution change (i.e. a better score than the WT results in a better resolution than the WT and vice versa), whereas the unfilled circles are mutations where the score incorrectly predicts the resolution change. Standard deviations in score are calculated from ten repeats of the design simulation. REU, Rosetta energy units.
3.3. Rosetta-designed crystals do not significantly improve the resolution
To determine whether our design approach was applicable to other proteins, we tested it on a model system, Δ+PHS, a well characterized highly stable variant of staphylococcal nuclease (SNase; Castañeda et al., 2009 ▸). We identified candidate designable residues at crystallographic interfaces as those with a Cα–Cα distance of less than 8 Å to neighboring symmetry mates. At each position, we introduced a point mutation followed by side-chain repacking and energy minimization of side-chain and backbone dihedral angles. We selected 11 designs for experimental characterization. However, only a single variant out of the 11 crystallized in conditions in which the WT protein normally crystallizes. Since we wanted our approach to yield crystals without having to reoptimize crystal-growth conditions, we sought to improve the crystallization rate of our designs. To this end, we introduced a step to generate backbone diversity before design. We drew inspiration from recent work showing that interfacial ΔΔG calculations are more accurate when the change in energy is computed across an ensemble of models (Barlow et al., 2018 ▸). Since backbone diversity was introduced beforehand, we were more conservative in our approach and we followed the introduction of point mutations with only side-chain repacking (strategy 1 in Section 3.1). From the second round of design we identified seven possible variants, but since two overlapped with those found in the first round, only the five new variants were characterized experimentally. With this design approach, four of the seven variants yielded crystals in WT-like conditions; thus, designing on an ensemble of structures had improved our crystallization rate from 9% to 57%.
Next, we determined the resolution of the diffraction data collected for our variants and compared it with that of the WT protein. To control for differences across crystals, we collected full diffraction data sets for at least three crystals of each variant (up to a maximum of 15), depending on the propensity of each variant to form diffraction-quality crystals. The K127L variant, in particular, affected crystal growth and nucleation significantly, yielding larger crystals across more conditions than the other variants. We then indexed, integrated and scaled the data sets using XDS. The most likely space group and unit cell was determined by POINTLESS (Evans, 2006 ▸). All variants except K64R and K127L had the WT space group (P21) and unit cell. We found that the K64R variant crystallized in space group P212121 and the K127L variant crystallized in space group P41, with identical unit-cell dimensions to those previously observed. P212121 and P41 are the third and second most common space groups for SNase crystals, respectively. In total, we were able to process the data for 37 of the 43 crystal diffraction patterns collected, with the remaining six data sets failing to index owing to issues such as ice rings or poor spot profiles. We report a summary of the collected and processed diffraction data in Supplementary Table S1.
Following processing with XDS, we identified the highest resolution shell as the shell with the highest resolution that still has a significant CC1/2 (t < 0.01). We used CC1/2, or the correlation between intensities when the data are split in half, to select the highest resolution shell because it provides a rigorous statistical cutoff (Karplus & Diederichs, 2015 ▸). To compute significance we calculated a t-value, t = r[(n − 2)/(1 − r 2)]1/2, where r is the CC1/2 value and n is the number of reflection pairs (the degrees of freedom), and compared it with Student’s t-distribution with the same degrees of freedom (Karplus & Diederichs, 2012 ▸). We found that on average three of the five designs (60%; Q123D, K64R and K127L) achieved a higher resolution than the WT (Fig. 3 ▸). However, the improvement was minimal (<0.05 Å). In general, the variant resolutions fell within a very narrow range, 1.67–1.85 Å, the width of which was only slightly greater than the range typically spanned by the resolutions of multiple crystals of the same variant (∼0.1 Å). Other measures of crystal quality such as I/σ (Supplementary Fig. S2) and R p.i.m. (Supplementary Fig. S3) showed qualitatively similar trends.
Figure 3.
Distributions of the highest resolution shell show that some designs improve on the WT resolution. Data were collected from multiple X-ray diffraction experiments and determined by significant CC1/2 according to Student’s t-test. Box plots show the median resolution ± one quartile. Unfilled circles indicate the average resolution. The dashed line is the average WT resolution. The K127L, K64R and Q123D designs have higher average resolutions than the WT.
3.4. Rosetta-designed crystals do not behave as predicted
Intrigued by the unexpectedly small variation in resolution between the designed variants, we solved the crystal structures of several candidates to investigate whether there was an underlying structural basis for the small changes in resolution. We discuss the variants below, grouped by their observed effects on SNase crystallization, and provide a general summary of observations across all variants.
3.4.1. Q123D and Q123E variants
In silico, the Q123D design strengthened the crystallographic interface by introducing an electrostatic interaction between Asp123 in the asymmetric unit and Lys71 in a neighboring symmetry mate. Upon solving the crystal structure, we found minor changes (of less than 0.5 Å r.m.s.d.) in the backbone conformation (Supplementary Fig. S4). Analysis of the site around the Q123D mutation revealed that Asp123 interacted with Lys84 of a neighboring symmetry mate (with a 3.1 Å distance between the lysine N atom and aspartic acid O atom) instead of Lys71 (Fig. 4 ▸). This result is in contrast to the WT structure, in which Lys71 interacts with Gln123 (3 Å distance between the corresponding O and N atoms) and Lys84 solely contacts a phosphate O atom of THP (Supplementary Fig. S5).
Figure 4.
The Q123D design (green; PDB entry 6u0x) predicts the correct interaction type but the incorrect interaction residue. In the design, Asp123 is predicted to form an electrostatic interaction with Lys71′ (where the prime indicates the symmetry mate), improving on the WT Gln–Lys interaction (pale yellow). However, this interaction is missing in the density and crystal structure of the variant (both orange; the 2mF o − DF c map contoured at 1.5σ for the Q123D variant crystal structure is carved within 2 Å of residues 71, 84 and 123). In place of the Glu123–Lys71 interaction, Asp123 hydrogen-bonds to Lys84, which also noncovalently interacts with the nucleotide analog (thymidine-3′,5′-diphosphate; THP) bound in the SNase active site. In this figure, each residue belongs to either the asymmetric unit or a different symmetry mate. Key interactions with atom-pair distances less than 3.5 Å are shown as dashed lines.
Although we were unable to solve the crystal structure of the Q123E variant owing to twinning and the presence of ice, we expect a similar interaction to be occurring. This supposition is supported by the observed average resolution, which is quite similar to that of the WT protein: the Q123D mutation slightly improves the resolution, whereas the Q123E mutation slightly worsens the resolution.
3.4.2. K133M variant
Like the Q123E and Q123D variants, the K133M variant did not significantly alter the resolution with respect to the WT protein and resulted in minimal backbone movements (0.17 Å r.m.s.d.; Supplementary Fig. S4). This design was favored in silico because it replaced an unfavorable electrostatic interaction between Lys133 and His8 with a van der Waals contact between Met133 and His8, while also slightly reducing the entropic cost of forming this crystal contact (Fig. 5 ▸). However, the K133M variant crystal structure revealed that although the interface had compacted slightly, the side chains were too distant to interact. Compared with the design, the minimum distance between Met133 and His8 in the crystal increased from 3.6 to 5.3 Å. This lack of interacting side chains is likely to explain the minimal effect of this mutation on crystal resolution.
Figure 5.
Superposition of the designed (green) and crystallized (orange; PDB entry 6u0w) K133M structures. The 2mF o − DF c map is contoured at 1.5σ and is carved within 2 Å of residues 8 and 133. The designed packing interaction (K133M–His8′, where the prime indicates the symmetry mate) does not occur in the crystal; instead, the side chains occupy alternative rotamers.
3.4.3. K127L and K64R variants
The K127L variant produced the highest resolution crystals in our study, although it did so in a manner that was not predicted by design: it altered the crystal space group. The WT protein crystallizes in space group P21, and in this crystal form Lys127 forms a salt bridge with the THP molecule bound in the neighboring SNase active site (Fig. 6 ▸). When this lysine is mutated to leucine, the salt-bridge interaction cannot form, destabilizing the P21 crystal form. Instead, the designed protein crystallizes in P41, a higher symmetry space group, in which Leu127 packs against a neighboring loop by forming backbone interactions with Lys28 and Gly29. In this space group, Lys71 replaces Lys127 as the interacting partner of the THP in the crystal form, suggesting that the interaction of the substrate phosphate groups with a positively charged side chain might be useful for SNase crystallization.
Figure 6.
The K127L variant (PDB entry 6ok8), which yields the highest resolution crystals, crystallizes in a higher symmetry space group (P41) than the WT (P21) because it breaks an electrostatic contact central to a crystallographic interface in P21. (a) In the K127L crystal the previous Lys127–THP salt bridge was retained, albeit with a different lysine residue (Lys71). The 2mF o − DF c map is contoured at 1.5σ and is carved within 2 Å of residue 71 and the THP molecule. (b) The new crystallographic interface containing Leu127 is well resolved in density and features nonspecific side chain–backbone contacts. The 2mF o − DF c map is contoured at 1.5σ and is carved within 2 Å of residues 28, 29 and 127.
In addition to the space-group change, the K127L variant had the largest backbone motions of all variants. The motions are in the loop region (residues 114–118) that precedes the α-helix containing Leu127. They occur when Lys116 shifts from contacting the neighboring molecule in the crystal to make contacts with the bound nucleotide analog instead.
The other substitution that improved the resolution in our study, albeit with a sample size of one, was K64R. Although we were unable to solve a crystal structure owing to problems with twinning and the presence of ice, we were able to determine from the diffraction data that the K64R mutation (like the K127L mutation) resulted in a change in space group, in this case from P21 to the higher symmetry space group P212121. To gain structural insight into why this variant and space group might lead to higher resolution, we aligned our K64R model with a different SNase structure crystallized in the same space group (PDB entry 5kee; L. A. Skerritt, J. A. Caro, A. Heroux, J. L. Schlessman & B. Garcia-Moreno E., unpublished work) and applied the symmetry operations necessary to generate the neighboring symmetry mates using unit-cell dimensions from the diffraction data. We then used the Rosetta FastRelax protocol to alleviate any clashes that might have been introduced. We observed two possible hydrogen-bonding interactions for Arg64 that might account for this change in space group: one with the carboxylic acid of Glu135 of a neighboring symmetry mate and another with the backbone carbonyl of the O atom of the same residue (Fig. 7 ▸).
Figure 7.

Models (orange) of the possible K64R interactions in space group P212121 show two new possible electrostatic interactions with Glu135 of a neighboring symmetry mate (indicated by a prime). (a, b) K64R either interacts with the side-chain or backbone atoms of Glu135. Lys64 was not strongly interacting in the WT, showing multiple possible rotameric states in electron density and missing density for some atoms (Supplementary Fig. S6). Thus (c), the introduced Arg64–Lys78′ interaction in the design (green) was intended to be stabilizing.
3.4.4. Retrospectively: Rosetta score recovers space-group changes but not resolution
Since we did not include the possibility of space-group changes in our design protocol, yet we observed changes for two variants, we retrospectively asked whether Rosetta could recover the correct space group (Fig. 8 ▸) by modeling and scoring each variant and the WT in each of the three most popular SNase space groups. For four out of six crystals (including the WT), we found that Rosetta could correctly predict the space group. Rosetta failed to predict the correct space-group changes for the K127L variant, yielding P212121 as the lowest scoring space group (when the actual space group was P41), and for the K64R substitution, yielding P21 as the lowest scoring space group (the actual space group was P212121).
Figure 8.
Scores of designs in the three most popular space groups for SNase. The lowest scoring space group is the experimentally observed space group in four out of six cases. For each design, average scores are shown ± one standard deviation computed over ten energy-minimized structures in P21, P212121 and P41. The experimentally observed space group is indicated by an unfilled circle. The designs are ordered by the score difference, Δ, between the lowest scoring and the second lowest scoring space group. The Δ value was greater on average for the designs crystallizing in P21 (∼6.3 REU versus ∼3.6 REU). For the two designs (K127L and K64R) that did not crystallize in P21, the correct space group could not be identified based on the score.
Finally, we asked whether the Rosetta score of the solved crystal structures correlated with the resolution, as we found to be the case for the engineered variants studied by Mizutani et al. (2008 ▸) (Section 3.1). We analyzed the crystal structures of our designs as previously described. Surprisingly, we found an anticorrelation between score and resolution (Fig. 9 ▸), although it should be noted that our resolution range only spans ∼0.1 Å, whereas a range of ∼0.8 Å was spanned by the crystal structures of Mizutani and coworkers (Fig. 1 ▸).
Figure 9.
Ex post facto analysis by extracting the score from the solved crystal structures of the designs shows an unexpected anticorrelation between resolution and score. Error bars indicate the standard deviation in resolution (from collecting and analyzing multiple diffraction patterns) and score (from ten repeated energy minimizations).
4. Discussion
We attempted to develop and validate a generic computational method for protein crystal contact design to engineer crystals that yield high-resolution structural information. Probing the PDB, we found that the Rosetta score correlated with the reported crystal resolution when accounting for common external variables relevant to crystallization such as the decision making to select the highest resolution shell, user handling of the crystal or crystallization conditions. Using data from an existing study in crystal engineering (Mizutani et al., 2008 ▸), we developed a design approach that recapitulated resolution-enhancing mutations at a rate of at least 11.5% (and at best 35.3%). We tested our design approach on a model SNase system, Δ+PHS, and found that our initial approach only resulted in one crystallizable variant out of ten, so we improved our approach by designing on an ensemble of backbones to increase this ratio to four in seven. Finally, we solved the crystal structures of several of our designs, but unfortunately observed little to no improvement in resolution. Post facto analysis revealed that (i) improvements in resolution came primarily from changes in space group and (ii) the Rosetta score of designs was not predictive of crystal resolution.
4.1. Point mutations primarily affected side-chain interactions and space groups
In general, when all variants were compared with both the WT and predicted design structures, changes in the fold of SNase were undetectable, but there were detectable changes in the side-chain interactions at the crystallographic interfaces. Firstly, there were minimal changes in the backbone structure, as anticipated for variants that only differ in point mutations at surface residues. The maximum observed root-mean-squared deviation (r.m.s.d.) for backbone atoms (N, C, CA and O) between the designed model (or WT, since the backbone was fixed during design) and variant crystal structures was 0.53 Å for the K127L variant, with all other variants having a lower backbone r.m.s.d. to their respective designed model (or the WT; Supplementary Fig. S4). Secondly, all mutations had some unpredicted effects on the interactions at the targeted crystallographic interface. These effects ranged from slight differences in side-chain rotameric states to entirely new interfaces. The smallest number of differences was observed in the K133M variant, where only a few side-chain dihedral angles differed from the designed structure and the interface was not greatly perturbed in general (Fig. 5 ▸).
The greatest difference that we observed was that two variants crystallized in higher symmetry space groups than the WT (P21): K64R crystallized in space group P212121 and K127L crystallized in space group P41. Several observed improvements in resolution were seen for this additional symmetry, such as a consistently higher I/σ value (a measure of the information content) over all resolution shells (Supplementary Fig. S2). The space-group change for the K127L variant was driven by breaking the WT lysine–THP contact across one crystallographic interface (Fig. 5 ▸), whereas the driver for the space-group change of the K64R variant was not definitively determined. The K64R substitution was desired because in the WT space group (Fig. 7 ▸) it introduced a putative electrostatic interaction between a terminal amino group of Arg64 and the carbonyl O atom of Lys78 of a symmetry mate. The mutation did not disrupt any contacts, as Lys64 was not resolved in the electron density of the WT crystal structure (Supplementary Fig. S6); however, this alteration still resulted in a change in space group. Since we were unable to solve a crystal structure of this variant, we resorted to modeling K64R in the new space group. Our models hinted that this space group might be preferred over that of the WT because Arg64 can potentially form a hydrogen bond to Glu135 via both side chain–side chain and side chain–backbone interactions (Fig. 7 ▸).
4.2. The relationship between score and resolution
Our initial hypothesis was that crystal interface stability, as captured by the Rosetta score, would correlate with reported crystal resolution. Yet, analysis comparing Rosetta scoring and reported crystal resolution for a subset of the PDB revealed no relationship between the two. We reasoned that our analysis was obfuscated by many factors that affected resolution but could not be captured by score alone (for example the protein or X-ray source intensity or detector resolution, user handling etc.). Firstly, we controlled for just the protein by analyzing only structures of SNase in the PDB. We found that controlling for the protein alone was insufficient: there was no trend between reported resolution and score for this set. However, when we controlled for more factors by analyzing the crystal structures of 22 variants of a single protein, with all data gathered by the same individual, using the same process and with the same equipment, we found, as hypothesized, that the Rosetta score correlated with the reported resolution (Fig. 1 ▸). Yet, when we repeated the same analysis for crystals of our model protein (again with all experiments conducted identically by one individual), we found an anticorrelation.
Why is there an inconsistent behavior between score and resolution? From the PDB, it is apparent that higher resolution crystal structures tend to have better protein geometry, i.e. fewer improbable side-chain rotamers, fewer outliers for bond lengths, fewer outliers for bond angles or fewer atomic/steric clashes (Williams et al., 2018 ▸). It is possible that because Rosetta represents proteins in internal coordinate space (φ/ψ), with fixed bond lengths and angles, it cannot rescue inherently poor geometry and thus better geometry contributes to a lower Rosetta score, even after energy minimization. Then, it is possible that the correlation observed between Rosetta score and resolution for the Mizutani set was a manifestation of the protein geometry improvements that come with higher resolution data, while some external factor, unaccounted for by the Rosetta score, affected resolution. If resolution does indeed drive score, then for crystals in a narrow range of resolutions we would not expect to observe a correlation between Rosetta score and resolution, as was the case for the Δ+PHS variants. In fact, the structure-validation software MolProbity (Chen et al., 2010 ▸) only compares structures within 0.25 Å bins to account for the improvements in protein geometry offered by higher resolution data.
4.3. Backrub improves design
Over the course of this study, we attempted both fixed-backbone design on the WT backbone and fixed-backbone design on a perturbed ensemble of 200 models generated from the WT backbone. To generate the perturbed ensemble, we used an approach known as backrub (Smith & Kortemme, 2008 ▸) that slightly alters the direction of the Cα–Cβ vector to expose the side chain to a new environment while minimally altering the backbone. We found that variants designed using an ensemble of backbones crystallized at a higher rate (4/7) than variants designed using the single WT backbone (1/11). We reason that this is because backrub-generated ensembles capture local backbone fluctuations, whereas fixed-backbone models do not, resulting in a better estimate of point-mutation effects, including on the interface energy (Barlow et al., 2018 ▸). In general, it is known that proteins are dynamic and are readily capable of incorporating point mutations, especially at protein surface positions (Davis et al., 2006 ▸), so a fixed-backbone approximation is not sufficient. Hence, we observed an increase in the crystallization success rate when we designed on an ensemble of backbones and selected designs that score well across multiple backbones for experimental characterization.
4.4. What we could not predict and why
Of the five designed proteins, only the K133M variant resulted in a crystal structure similar to the design. The designs Q123D and Q123E resulted in the introduction of Glu/Asp–Lys electrostatic interactions, but with a neighboring symmetry mate instead of that targeted by the design. For these designs, it is not clear how to improve the design algorithm.
For the K127L and K64R designs, we observed unpredictable changes in space group. In retrospect, the K127L design should not have scored well in Rosetta, as the Lys127 amino group clearly makes electrostatic contacts with the phosphate groups of the THP molecule bound by the neighboring symmetry mate. Despite breaking the lysine–THP interaction, the Rosetta score was lower for the variant than for the WT protein (in the WT space group), indicating that Rosetta does not correctly weigh the strength of this electrostatic interaction. One possible solution to overcome this issue in the future would be to bias the Rosetta score by the WT electron density, such that eliminating a clearly present interaction is strongly penalized, whereas designing residues that are not well resolved is favored.
One possible reason for the failure to improve the resolution is that Rosetta is not yet finely tuned for the types of atomic interactions we tried to create. Rosetta was first developed to study protein folding in the context of small, globular domains, before being applied to the inverse challenge of design (Baker, 2019 ▸). Over the years Rosetta has performed best when redesigning protein cores and tightly packed interfaces (Kuhlman et al., 2003 ▸; Kortemme et al., 2004 ▸; Rocklin et al., 2017 ▸; Bale et al., 2016 ▸). Our objective here is one of the first attempts to design a loosely packed interface with a significant amount of water involved. Future work to improve the design of solvated interfaces might include explicitly analyzing water interactions at the interface either by flooding, as has recently been successfully used to dock interfacial waters (Kilambi et al., 2013 ▸), or the recently developed HBNet method in Rosetta, which has been used to design hydrogen-bonding networks de novo (Boyken et al., 2016 ▸). Multi-state design (Leaver-Fay et al., 2011 ▸) might also be necessary to prevent undesired changes in space group.
4.5. Were our model protein crystals already optimal?
Initial analyses showed that the diffraction patterns collected for the WT control in this study had an average high-resolution limit of 1.77 Å (Fig. 3 ▸), which agrees with the resolution (1.8 Å) of the PDB-deposited crystal structure of Δ+PHS (PDB entry 3bdc). This value falls firmly in the middle of the distribution of all SNase crystal resolutions (Fig. 1 ▸), with 1.35 and 2.5 Å being the highest and lowest observed resolutions, respectively. Separately, Mizutani and coworkers observed changes of +0.2 to −0.6 Å in their study of the effects of point mutations on diphthine synthase crystals (Mizutani et al., 2008 ▸). Based on these prior observations, we expected to observe changes in resolution of ±0.5 Å; however, we instead found that the designs spanned a narrow range of 1.67–1.85 Å or ∼1.77 ± 0.1 Å. Nonetheless, the variance in resolution within crystals of the same variant compared favorably between our study and that of Mizutani and coworkers. We observed ranges of ∼0.1 Å, while Mizutani and coworkers reported a 95% confidence interval estimate of ±0.05 Å for WT diphthine synthase, analyzing the diffraction data from 13 crystals (Mizutani et al., 2008 ▸).
One possible explanation for the minimal improvement in resolution using our designs is that our choice of model protein, Δ+PHS, was already optimized for forming high-quality crystals. We selected Δ+PHS as a model system for its high stability (11.8 kcal mol−1; Castañeda et al., 2009 ▸), high yield (over 60 mg protein per litre of cell culture) and crystallizability (over 300 crystal structures have been deposited in the PDB). We selected for these features so that our model protein would readily incorporate point mutations and so that the corresponding designs would be likely to express in high quantities and readily crystallize. However, these features are also likely to have pre-selected for a protein that is optimal for crystallization, in which a majority of point mutations might not be able to yield significant improvements in resolution. Future work might then focus on a protein that is less engineered and may have more room for improvement.
5. Conflict of interest statement
JJG is an unpaid board member of the Rosetta Commons. Under institutional participation agreements between the University of Washington, acting on behalf of the Rosetta Commons, Johns Hopkins University may be entitled to a portion of the revenue received on licensing the Rosetta software, including the programs described here. As a member of the Scientific Advisory Board of Cyrus Biotechnology, JJG is granted stock options. Cyrus Biotechnology distributes the Rosetta software, which may include methods described in this paper.
Supplementary Material
PDB reference: staphylococcal nuclease variant Δ+PHS, K127L mutant, 6ok8
PDB reference: K133M mutant, 6u0w
PDB reference: Q123D mutant, 6u0x
Supplementary Figures and Tables. DOI: 10.1107/S2059798319013226/ud5010sup1.pdf
Analysis of protein-protein interactions in crystallo.: https://doi.org/10.5281/zenodo.3216968
Mizutani forward design.: https://doi.org/10.5281/zenodo.3222946
Mizutani forward design backrub.: https://doi.org/10.5281/zenodo.3228344
SNAse design.: https://doi.org/10.5281/zenodo.3228838
SNase backrub design part 1.: https://doi.org/10.5281/zenodo.3235486
SNase backrub design part 2.: https://doi.org/10.5281/zenodo.3235518
SNAse diffraction data and analysis.: https://doi.org/10.5281/zenodo.3228351
Acknowledgments
JRJ, ACR, JMB, BGME and JJG designed the research. JRJ and ACR performed the research. JRJ, ACR, JMB, BGME and JJG analyzed the data. JRJ, ACR, JMB, BGME and JJG wrote the paper. The authors would like to acknowledge Jesse B. Yoder for helpful discussion and Michael L. Love for help with instrumentation. The super-computing resources in this study have been provided in part by the Maryland Advanced Research Computing Center. Diffraction data were collected at the X-ray laboratory of the Department of Biophysics and Biophysical Chemistry, School of Medicine, Johns Hopkins University.
Funding Statement
This work was funded by National Institutes of Health, National Institute of General Medical Sciences grants F31-GM123616, T32-GM008403, and R01-GM078221. National Science Foundation grant NSF-MCB 1517378 to Aaron C. Robinson and Bertrand García-Moreno E.. Office of the Provost, Johns Hopkins University grant Discovery Award to Jeliazko R. Jeliazkov, , , , and Jeffrey J. Gray.
References
- Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., Headd, J. J., Hung, L.-W., Kapral, G. J., Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. & Zwart, P. H. (2010). Acta Cryst. D66, 213–221. [DOI] [PMC free article] [PubMed]
- Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. [DOI] [PMC free article] [PubMed]
- Alford, R. F., Leaver-Fay, A., Jeliazkov, J. R., O’Meara, M. J., DiMaio, F. P., Park, H., Shapovalov, M. V., Renfrew, P. D., Mulligan, V. K., Kappel, K., Labonte, J. W., Pacella, M. S., Bonneau, R., Bradley, P., Dunbrack, R. L. Jr, Das, R., Baker, D., Kuhlman, B., Kortemme, T. & Gray, J. J. (2017). J. Chem. Theory Comput. 13, 3031–3048. [DOI] [PMC free article] [PubMed]
- Baker, D. (2019). Protein Sci. 28, 678–683. [DOI] [PMC free article] [PubMed]
- Bale, J. B., Gonen, S., Liu, Y., Sheffler, W., Ellis, D., Thomas, C., Cascio, D., Yeates, T. O., Gonen, T., King, N. P. & Baker, D. (2016). Science, 353, 389–394. [DOI] [PMC free article] [PubMed]
- Barlow, K. A., Ó Conchúir, S., Thompson, S., Suresh, P., Lucas, J. E., Heinonen, M. & Kortemme, T. (2018). J. Phys. Chem. B, 122, 5389–5399. [DOI] [PMC free article] [PubMed]
- Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. [DOI] [PMC free article] [PubMed]
- Bliven, S., Lafita, A., Parker, A., Capitani, G. & Duarte, J. M. (2018). PLOS Comput. Biol. 14, e1006104. [DOI] [PMC free article] [PubMed]
- Boyken, S. E., Chen, Z., Groves, B., Langan, R. A., Oberdorfer, G., Ford, A., Gilmore, J. M., Xu, C., DiMaio, F., Pereira, J. H., Sankaran, B., Seelig, G., Zwart, P. H. & Baker, D. (2016). Science, 352, 680–687. [DOI] [PMC free article] [PubMed]
- Castañeda, C. A., Fitch, C. A., Majumdar, A., Khangulov, V., Schlessman, J. L. & García-Moreno, B. E. (2009). Proteins, 77, 570–588. [DOI] [PubMed]
- Chaudhury, S., Lyskov, S. & Gray, J. J. (2010). Bioinformatics, 26, 689–691. [DOI] [PMC free article] [PubMed]
- Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S. & Richardson, D. C. (2010). Acta Cryst. D66, 12–21. [DOI] [PMC free article] [PubMed]
- Conway, P., Tyka, M. D., DiMaio, F., Konerding, D. E. & Baker, D. (2014). Protein Sci. 23, 47–55. [DOI] [PMC free article] [PubMed]
- Cooper, D. R., Boczek, T., Grelewska, K., Pinkowska, M., Sikorska, M., Zawadzki, M. & Derewenda, Z. (2007). Acta Cryst. D63, 636–645. [DOI] [PubMed]
- Dale, G. E., Oefner, C. & D’Arcy, A. (2003). J. Struct. Biol. 142, 88–97. [DOI] [PubMed]
- Davis, I. W., Arendall, W. B., Richardson, D. C. & Richardson, J. S. (2006). Structure, 14, 265–274. [DOI] [PubMed]
- DiMaio, F., Leaver-Fay, A., Bradley, P., Baker, D. & André, I. (2011). PLoS One, 6, e20450. [DOI] [PMC free article] [PubMed]
- DiMaio, F., Terwilliger, T. C., Read, R. J., Wlodawer, A., Oberdorfer, G., Wagner, U., Valkov, E., Alon, A., Fass, D., Axelrod, H. L., Das, D., Vorobiev, S. M., Iwaï, H., Pokkuluri, P. R. & Baker, D. (2011). Nature (London), 473, 540–543. [DOI] [PMC free article] [PubMed]
- Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. (2010). Acta Cryst. D66, 486–501. [DOI] [PMC free article] [PubMed]
- Evans, P. (2006). Acta Cryst. D62, 72–82. [DOI] [PubMed]
- Evdokimov, A. G., Mekel, M., Hutchings, K., Narasimhan, L., Holler, T., McGrath, T., Beattie, B., Fauman, E., Yan, C., Heaslet, H., Walter, R., Finzel, B., Ohren, J., McConnell, P., Braden, T., Sun, F., Spessard, C., Banotai, C., Al-Kassim, L., Ma, W., Wengender, P., Kole, D., Garceau, N., Toogood, P. & Liu, J. (2008). J. Struct. Biol. 162, 152–169. [DOI] [PubMed]
- Fusco, D., Barnum, T. J., Bruno, A. E., Luft, J. R., Snell, E. H., Mukherjee, S. & Charbonneau, P. (2014). PLoS One, 9, e101123. [DOI] [PMC free article] [PubMed]
- Fusco, D., Headd, J. J., De Simone, A., Wang, J. & Charbonneau, P. (2013). Soft Matter, 10, 290–302. [DOI] [PMC free article] [PubMed]
- García-Moreno, B. E., Dwyer, J. J., Gittis, A. G., Lattman, E. E., Spencer, D. S. & Stites, W. E. (1997). Biophys. Chem. 64, 211–224. [DOI] [PubMed]
- Giegé, R. (2013). FEBS J. 280, 6456–6497. [DOI] [PubMed]
- Kabsch, W. (2010). Acta Cryst. D66, 125–132. [DOI] [PMC free article] [PubMed]
- Karplus, P. A. & Diederichs, K. (2012). Science, 336, 1030–1033. [DOI] [PMC free article] [PubMed]
- Karplus, P. A. & Diederichs, K. (2015). Curr. Opin. Struct. Biol. 34, 60–68. [DOI] [PMC free article] [PubMed]
- Kilambi, K. P., Pacella, M. S., Xu, J., Labonte, J. W., Porter, J. R., Muthu, P., Drew, K., Kuroda, D., Schueler-Furman, O., Bonneau, R. & Gray, J. J. (2013). Proteins, 81, 2201–2209. [DOI] [PMC free article] [PubMed]
- Kishishita, S., Shimizu, K., Murayama, K., Terada, T., Shirouzu, M., Yokoyama, S. & Kunishima, N. (2008). Acta Cryst. D64, 397–406. [DOI] [PubMed]
- Knight, S., Andersson, I. & Brändén, C.-I. (1989). Science, 244, 702–705. [DOI] [PubMed]
- Kortemme, T., Joachimiak, L. A., Bullock, A. N., Schuler, A. D., Stoddard, B. L. & Baker, D. (2004). Nature Struct. Mol. Biol. 11, 371–379. [DOI] [PubMed]
- Kuhlman, B., Dantas, G., Ireton, G. C., Varani, G., Stoddard, B. L. & Baker, D. (2003). Science, 302, 1364–1368. [DOI] [PubMed]
- Lai, Y.-T., Wang, T., O’Dell, S., Louder, M. K., Schön, A., Cheung, C. S. F., Chuang, G.-Y., Druz, A., Lin, B., McKee, K., Peng, D., Yang, Y., Zhang, B., Herschhorn, A., Sodroski, J., Bailer, R. T., Doria-Rose, N. A., Mascola, J. R., Langley, D. R. & Kwong, P. D. (2019). Nature Commun. 10, 47. [DOI] [PMC free article] [PubMed]
- Lanci, C. J., MacDermaid, C. M., Kang, S., Acharya, R., North, B., Yang, X., Qiu, X. J., DeGrado, W. F. & Saven, J. G. (2012). Proc. Natl Acad. Sci. USA, 109, 7304–7309. [DOI] [PMC free article] [PubMed]
- Lawson, D. M., Artymiuk, P. J., Yewdall, S. J., Smith, J. M. A., Livingstone, J. C., Treffry, A., Luzzago, A., Levi, S., Arosio, P., Cesareni, G., Thomas, C. D., Shaw, W. V. & Harrison, P. M. (1991). Nature (London), 349, 541–544. [DOI] [PubMed]
- Leaver-Fay, A., Jacak, R., Stranges, P. B. & Kuhlman, B. (2011). PLoS One, 6, e20937. [DOI] [PMC free article] [PubMed]
- Leaver-Fay, A., Snoeyink, J. & Kuhlman, B. (2008). Bioinformatics Research and Applications, edited by I. Bãndoiu, R. Sunderraman & A. Zelikovsky, pp. 343–354. Berlin, Heidelberg: Springer.
- Liu, H. & Naismith, J. H. (2008). BMC Biotechnol. 8, 91. [DOI] [PMC free article] [PubMed]
- McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674. [DOI] [PMC free article] [PubMed]
- Mizutani, H., Saraboji, K., Malathy Sony, S. M., Ponnuswamy, M. N., Kumarevel, T., Krishna Swamy, B. S., Simanshu, D. K., Murthy, M. R. N. & Kunishima, N. (2008). Acta Cryst. D64, 1020–1033. [DOI] [PubMed]
- Nivón, L. G., Moretti, R. & Baker, D. (2013). PLoS One, 8, e59004. [DOI] [PMC free article] [PubMed]
- Oubridge, C., Ito, N., Teo, C.-H., Fearnley, I. & Nagai, K. (1995). J. Mol. Biol. 249, 409–423. [DOI] [PubMed]
- Park, H., Bradley, P., Greisen, P., Liu, Y., Mulligan, V. K., Kim, D. E., Baker, D. & DiMaio, F. (2016). J. Chem. Theory Comput. 12, 6201–6212. [DOI] [PMC free article] [PubMed]
- Price, W. N. II, Chen, Y., Handelman, S. K., Neely, H., Manor, P., Karlin, R., Nair, R., Liu, J., Baran, M., Everett, J., Tong, S. N., Forouhar, F., Swaminathan, S. S., Acton, T., Xiao, R., Luft, J. R., Lauricella, A., DeTitta, G. T., Rost, B., Montelione, G. T. & Hunt, J. F. (2009). Nature Biotechnol. 27, 51–57. [DOI] [PMC free article] [PubMed]
- Richardson, J. S. (1981). Adv. Protein Chem. 34, 167–339. [DOI] [PubMed]
- Rocklin, G. J., Chidyausiku, T. M., Goreshnik, I., Ford, A., Houliston, S., Lemak, A., Carter, L., Ravichandran, R., Mulligan, V. K., Chevalier, A., Arrowsmith, C. H. & Baker, D. (2017). Science, 357, 168–175. [DOI] [PMC free article] [PubMed]
- Rupp, B. & Segelke, B. (2001). Nature Struct. Biol. 8, 663–664. [DOI] [PubMed]
- Smith, C. A. & Kortemme, T. (2008). J. Mol. Biol. 380, 742–756. [DOI] [PMC free article] [PubMed]
- Stranges, P. B. & Kuhlman, B. (2013). Protein Sci. 22, 74–82. [DOI] [PMC free article] [PubMed]
- Wang, G. & Dunbrack, R. L. (2003). Bioinformatics, 19, 1589–1591. [DOI] [PubMed]
- Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S. & Richardson, D. C. (2018). Protein Sci. 27, 293–315. [DOI] [PMC free article] [PubMed]
- Wlodawer, A., Minor, W., Dauter, Z. & Jaskolski, M. (2008). FEBS J. 275, 1–21. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
PDB reference: staphylococcal nuclease variant Δ+PHS, K127L mutant, 6ok8
PDB reference: K133M mutant, 6u0w
PDB reference: Q123D mutant, 6u0x
Supplementary Figures and Tables. DOI: 10.1107/S2059798319013226/ud5010sup1.pdf
Analysis of protein-protein interactions in crystallo.: https://doi.org/10.5281/zenodo.3216968
Mizutani forward design.: https://doi.org/10.5281/zenodo.3222946
Mizutani forward design backrub.: https://doi.org/10.5281/zenodo.3228344
SNAse design.: https://doi.org/10.5281/zenodo.3228838
SNase backrub design part 1.: https://doi.org/10.5281/zenodo.3235486
SNase backrub design part 2.: https://doi.org/10.5281/zenodo.3235518
SNAse diffraction data and analysis.: https://doi.org/10.5281/zenodo.3228351








