Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds

Brian Coventry; David Baker

doi:10.1371/journal.pcbi.1008061

. 2021 Mar 8;17(3):e1008061. doi: 10.1371/journal.pcbi.1008061

Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds

Brian Coventry ^1,², David Baker ^2,^3,^4,^*

Editor: Dina Schneidman-Duhovny⁵

PMCID: PMC7971855 PMID: 33684097

Abstract

In aqueous solution, polar groups make hydrogen bonds with water, and hence burial of such groups in the interior of a protein is unfavorable unless the loss of hydrogen bonds with water is compensated by formation of new ones with other protein groups. For this reason, buried “unsatisfied” polar groups making no hydrogen bonds are very rare in proteins. Efficiently representing the energetic cost of unsatisfied hydrogen bonds with a pairwise-decomposable energy term during protein design is challenging since whether or not a group is satisfied depends on all of its neighbors. Here we describe a method for assigning a pairwise-decomposable energy to sidechain rotamers such that following combinatorial sidechain packing, buried unsaturated polar atoms are penalized. The penalty can be any quadratic function of the number of unsatisfied polar groups, and can be computed very rapidly. We show that inclusion of this term in Rosetta sidechain packing calculations substantially reduces the number of buried unsatisfied polar groups.

Author summary

We present an algorithm that fits into existing protein design software that allows researchers to penalize unsatisfied polar atoms in protein structures during design. These polar atoms usually make hydrogen-bonds to other polar atoms or water molecules and the absence of such interactions leaves them unsatisfied energetically. Penalizing this condition is tricky because protein design software only looks at pairs of amino acids when considering which amino acids to choose. Current approaches to solve this problem use additive approaches where satisfaction or unsatisfaction is approximated on a continuous scale; however, in reality, satisfaction or unsatisfaction is an all-or-none condition. The simplest all-or-none method is to penalize polar atoms for simply existing and then to give a bonus any time they are satisfied. This fails when two different amino acids satisfy the same atom; the pairwise nature of the protein design software will double count the satisfaction bonus. Here we show that by anticipating the situation where two amino acids satisfy the same polar atom, we can apply a penalty to the two amino acids in advance and assume the polar atom will be there. This scheme correctly penalizes unsatisfied polar atoms and does not fall victim to overcounting.

This is a PLOS Computational Biology Methods paper.

Introduction

Polar groups on the surface of proteins in aqueous solution make favorable hydrogen bonds with water molecules. If these polar groups become buried, either upon folding or binding to another protein, these hydrogen bonds with water must be broken. The energetic penalty of losing h-bonds with water can be offset if a buried polar atom makes a hydrogen bond to another protein atom. We say that this second polar atom “satisfies” the first polar atom. If when buried, the first atom does not make a hydrogen bond, we call it a “buried unsatisfied”.

Modeling the loss in favorable interactions of buried unsatisfied polar atoms is straightforward with explicit solvent models since upon burial, interactions with explicit water molecules are lost. For protein design and other applications where large scale sampling is required and chemical composition (amino acid identity) is changing, implicit solvation models have considerable advantage over explicit models in computational efficiency. The most computationally efficient implicit solvent models are pairwise additive, but identifying and penalizing buried unsatisfied polar atoms is challenging using such models as burial is a collective property. Instead, most current methods use non-pair additive approaches, often involving solvent accessible surface area calculation. The BuriedUnsatsfiedPolarCalculator in Rosetta for instance first calculates which atoms are inaccessible to solvent, and then determines whether or not they are making a hydrogen bond. These methods work well on a fixed protein, but they are not amenable to the implicit-solvation pairwise-decomposable sidechain packer of Rosetta[1] or other rotamer-based packing algorithms.

There have been several attempts to capture the energetic cost of buried unsatisfied polar atom penalty in a pairwise manner. The LK solvation model gives all polar atoms a penalty when another atom enters its implicit sphere of solvation[2]. The LK-Ball solvation model takes the LK solvation model and restricts it to positions most critical for hydrogen bonding[3]. A downside to these pairwise methods is that they are intrinsically additive. Instead of a switching behavior where an atom becomes completely buried and can no longer hydrogen bond with water, the burial is gradual and depends on the local density of nearby atoms. These approaches do not specifically penalize buried unsatisfied polar atoms, but instead attempt to model this effect indirectly through balancing the energies of desolvation and hydrogen bonding.

Materials and methods

We describe a method for explicitly penalizing buried unsatisfied polar atoms during sidechain packing called the 3-Body Oversaturation Penalty (3BOP). We take advantage of the fact that for rotamer-based side chain packing calculations, while the calculated energies must be pairwise-decomposable, the calculations leading to these energies need not be pairwise-decomposable. The method is applicable to fixed-backbone packing trajectories and requires that all rotamers are available before the Monte Carlo trajectory begins.

Burial region calculation

To assign the buried region in a sequence-independent way so that it can be determined before amino acid sequence design, the protein is first mutated to poly Leucine with Chi1 = 240 and Chi2 = 120. Next, the EDTSurf method[4,5] is used to voxelize cartesian space at 0.5Å resolution, determine the molecular surface with a 2.3Å radius sphere, and label each voxel with its depth below the molecular surface. The burial region is then defined as all voxel XÅ below the molecular surface where X is between 3.5–5.5Å depending on user preference. Fig 1B shows an example of a 4.5Å burial region on Ubiquitin[6].

Fig 1 — **Overview of method** (A) Pictorial representation of the penalty rules. A buried GLU oxygen atom is satisfied by two serine hydroxyls. An oversaturation penalty is applied to the two hydroxyls. In this example: β = 1, σ = -1, ω = 1. (B) Burial region calculation applied to Ubiquitin (PDB: 1UBQ)[6]. Grey spheres indicate the buried region at 4.5Å depth from the poly-LEU molecular surface with a 2.3Å probe size and were generated using PyRosetta[7]. Images generated with PyMOL[8].

Penalty calculation

After the burial region calculation, all buried polar atoms in all rotamers are identified. One-body and two-body atom pseudoenergies can then be assigned with the following simple algorithm:

for B in all_buried_polar_atoms

Constants β, σ, ω

Accumulate 1-body energy β to B

for Q in atoms_that_hbond_to_B:

Accumulate 2-body energy σ to edge B<->Q

for Q1, Q2 in pairs of (atoms_that_hbond_to_B):

Accumulate 2-body energy ω to edge Q1<->Q2

Constants β, σ, and ω, representing the atom burial penalty, atom-atom satisfaction bonus, and atom-atom oversaturation penalty may be selected on a per-buried-polar-atom basis according to Eq 1. Fig 1A gives a pictorial illustration of this algorithm, which is O (n³) on the local number of polar rotamers that h-bond at nearby sequence positions; the algorithm is approximating a 3-body interaction.

Oversaturation rotamer correction

A problem occurs with this simple algorithm when multiple B from different rotamers at the same sequence position hbond to the same Q1<->Q2 pair. With each additional B, the oversaturation penalty between Q1 and Q2 rises. This oversaturation penalty is an error because these B cannot exist at the same time. The solution is to limit the oversaturation penalty to the maximum value one rotamer can generate at each position.

Corrected 3BOP Algorithm:

for n in [1 … N_res]

Let M = map (key = Rotamer<->Rotamer, value = list (rotamers_at_n))

for R in rotamers_at_n

for B in buried_polar_atoms_of_R

Constants β, σ, ω

Accumulate 1-body energy β to B

for Q in atoms_that_hbond_to_B:

Accumulate 2-body energy σ to B<->Q

for Q1,Q2 in non-redundant-paired (atoms_that_hbond_to_B):

M [r_Q1<->r_Q2] [R] + = ω

for r_Q1<->r_Q2 in M.keys()

Accumulate 2-body energy max (M [r_Q1<->r_Q2]) to r_Q1<->r_Q2

where r_Q refers to the rotamer containing Q. The memory footprint of M can be greatly reduced if instead of storing a list, one only stores the running max value after iterating over each rotamer R.

Results

Buried unsatisfied polar quadratic penalty

The 3BOP algorithm described in the Materials and Methods generates a penalty P of the form

P = β + σ \cdot H + ω \cdot \frac{H \cdot (H - 1)}{2}

(1)

where H is the number of h-bonds, β is the penalty for burying a polar atom, σ is the bonus for satisfying a buried polar atom, and ω is the penalty for oversaturating a buried polar atom. Because of the quadratic term, this formulation can better describe the “all or none” aspect of buried unsatisfied atoms than linear models such as the LK solvation model. The coefficients β, σ, and ω can be modified to give any quadratic relationship between the number of h-bonds and the penalty. Additionally, they can be modified on a per-atom basis to give different penalty profiles to different atoms types. As Table 1 shows, parameters can be chosen in general to favor any single number or pair of consecutive numbers of h-bonds.

Table 1. Example penalty schemes for different atom types.

Atom Type	Target #H-Bonds	Coefficients	Resulting Penalty for Buried Polar Atom
NH1	1	β = 1,σ = -1,ω = 2	1	0	1	4	9	16
Carbonyl O	1 or 2	β = 1,σ = -1,ω = 1	1	0	0	1	3	6
NH2	2	β = 4,σ = -3,ω = 2	4	1	0	1	4	9
Carboxylate O	2 or 3	β = 3,σ = -2,ω = 1	3	1	0	0	1	3
NH3	3	β = 9,σ = -5,ω = 2	9	4	1	0	1	4
		# h-bonds	0	1	2	3	4	5

Open in a new tab

Examples of coefficients that lead to a penalty of 1 for buried polar atoms that are off-by-1 from their ideal number of h-bonds. The target # Hbonds listed here are for example only and are not necessarily ideal.

Since the atomic depth calculation is performed on a poly-Leucine backbone, there is no dependence upon the sequence or sidechain conformations to determine burial; hence atomic burial can be pre-computed once before packing begins. However, polar atoms just below the surface will not be considered buried. Consider for instance a backbone carbonyl oxygen that is outside of the burial region, but that is covered by a Phenylalanine ring. Such an oxygen would be buried by explicit solvent or SASA-based measure, but would not be buried by this algorithm.

Fewer buried unsatisfied polar atoms

Four hundred de novo one-sided interfaces to barnase[9] were designed to test the effectiveness of 3BOP in generating designs that make hydrogen bonds to buried polar atoms. Designed mini-proteins[10] docked to the polar interface of barnase had fewer buried unsatisfied polar atoms when 3BOP was added to the ref2015[11] energy function (Fig 2A). With the recommended setting of 5, the 50th percentile structure coming from 3BOP + ref2015 had 5 buried unsatisfied polar atoms while the 50th percentile structure coming from ref2015 alone had 7. While a reduction from 7 to 5 may seem small, it is important to note that many of the docks had impossible-to-satisfy polar atoms (e.g. edge-strand to hydrophobic surface). Some docks fully satisfied all polar atoms on the target; 3BOP + ref2015 generated 3 such docks while ref2015 alone generated none. As a further test, one hundred native proteins were redesigned to only allow polar residues as described in S1 Text. S1A Fig shows that redesigning the proteins with 3BOP + ref2015 resulted in a median of 2 buried unsatisfied polar atoms while ref2015 alone resulted in a median of 5 (the 3BOP designs also had fewer polar atoms; S1C Fig).

Fig 2 — 40 mini-proteins of mixed alpha/beta and all alpha topology were docked against barnase (PDB: 1BRS) using PatchDock[12,13] and the top 10 docks for each selected. Residues within 10Å of the interface were allowed to repack on barnase and to design with all 20 amino acids on the mini-protein. A ramped-repulsive pack and minimize scheme was used to arrive at the final amino acid sequence[14] using the score function described at the left of the panels. All methods that start with “+” use ref2015. For “+ 5 x 3BOP No Over”, the parameters were β = 5, σ = -5, and ω = 0 and for “+ 5 x 3BOP”, the parameters were β = 5, σ = -5, and ω = 5 for all polar atoms. The “+ 10 x” variants used 10 instead of 5 for the respective parameters. As the maximum energy for an h-bond in Rosetta is approximately -2 kcal/mol, the “+ 2.5 x H-Bond” and “+ 5 x H-Bond” increased the maximum h-bond energy to -7 and -12 kcal/mol respectively. A) Number of buried unsatisfied polar atoms at the interface. In order from the left, vertical divisions indicate the number of proteins that have 0–1, 1–2, 2–3, or more unsatisfied polar atoms as indicated in the last row. B) Number of cross-interface h-bonds. In order from left, each division represents the number of proteins that had from X to (X+2) cross-interface h-bonds which each division to the right representing (X+3) to (X+5). See S1 Text for more information and S1 Scripts for scripts to reproduce the data in S1 Data.

3BOP is better than simply upweighting h-bonds

Figs 2A and S1A show that incorporation of 3BOP results in fewer buried unsatisfied polar atoms than simply increasing the hydrogen bond strength. However, a limitation of our approximation is that the oversaturation penalties do not depend on the presence of the buried polar atom rotamer. For instance, in Fig 1, if the Glutamate rotamer was not present, the penalty between Serine rotamers would still be applied. To investigate the extent to which this happens, Outer Membrane Phospholipase A of E. Coli, a natural protein with extensive buried polar networks was repacked with an implementation of 3BOP in Rosetta. As S2A Fig shows, the less strict the threshold for h-bonds, the more the extraneous penalties. As the number of rotamers increase, either by adding extra rotamers, or enabling design, the number of extraneous penalties further increases. This problem may be reduced by increasing the stringency of the h-bond-quality threshold or limiting the number of polar rotamers.

While the extraneous oversaturation penalties may pose a problem for structure prediction, their effect on protein design is not as severe. For designed proteins, there may be several solutions that satisfy a backbone and if one of them is eliminated by an error, there are still several other equally valid solutions, and an almost infinite number of backbones may be attempted. Even if all solutions for a backbone are eliminated another backbone will likely provide solutions. Overall, for design, sins of omission (moving forward with designs containing buried unsats) are more serious than sins of commission (incorrectly eliminating a reasonable design).

Discussion

An advantage of an explicit penalty for buried unsatisfied polar atoms is that protein designers can set the penalty to values appropriate to their application. Previous approaches have sought to more accurately model the energetics of protein electrostatics, but the final functional forms are intertwined with the rest of the force-field in a way that could not be arbitrarily modified. Explicit control allows designers to directly penalize buried unsatisfied polar atoms. Designers also have control over the level of hydrogen bond satisfaction in their designs. For example, the number of h-bonds that the NH2 group of glutamine (1, 2, or 1 or 2) must make to be considered satisfied can be specified by suitable parameter choices (Table 1). While this paper and the implementation in Rosetta do not consider explicit bound water molecules in crystal structures or from other sources, incorporating these is straightforward. As long as the location of the water molecules is known at pre-compute time, they may simply be modeled as polar atoms that can make hydrogen bonds. The 3BOP algorithm is computationally efficient (S3 Fig), and in its current form without explicit water atoms, is now widely used in our research group, and we expect that it will quite broadly help address long standing issues with buried unsaturated polar atoms in de novo protein design.

Supporting information

S1 Fig. Effect of penalty on protein redesign.

100 native proteins were redesigned using the energy function and protocol described at the left of the panels, allowing only polar sidechains. Each method/row uses the same parameters as Fig 2A except Lysine-NZ used β = 15 and σ = -10 for the “5 x” variants and β = 30 and σ = -20 for the “10 x” variants. A) Number of buried unsatisfied polar atoms for each protein. In order from the left, vertical divisions indicate the number of proteins that have 0, 1, 2, or more unsatisfied polar atoms as indicated in the last row. B) Number of h-bonds to buried polar atoms. In order from left, each division represents the number of proteins that had from X to (X+4) h-bonds to buried polar atoms with each division to the right representing from (X+5) to (X+9). C) Number of buried polar atoms. In order from left, each division represents the number of proteins that had from X to (X+9) buried polar atoms with each division to the right representing from (X+10) to (X+19). For more information, see S1 Text.

(TIF)

Click here for additional data file.^{(2.8MB, tif)}

S2 Fig

Extraneous oversaturation and performance A) The Outer Membrane Phospholipase A (PDB: 1ILZ) [15] was either repacked with standard rotamers (purple plus) or extra rotamers (pink cross) or redesigned with all amino acids using standard rotamers (green down arrow) or extra rotamers (yellow up arrow). An expansive buried h-bond network exists in the structure. The percentage of native rotamers in this h-bond network that experience extraneous oversaturation penalties to other native rotamers is plotted vs the energy threshold for a h-bond to be considered. In short, the extraneous oversaturation penalties were determined by performing the 3BOP algorithm and looking for penalties between native rotamers that were not present before the design/repack rotamers were considered (see S1 Text). The black line shows the percentage of h-bonds in the h-bond network that pass the energy threshold. B) Ninety-seven native proteins had their h-bond network residues redesigned using only polar amino acids. Amino acid recovery error of ref2015 (grey), ref2015 + 5 x 3BOP No Over (light blue), and ref2015 + 5 x 3BOP (dark blue) plotted. Parameters used for 3BOP tests identical to Fig 2 except Lysine-NZ used β = 15 and σ = -10. The h-bond threshold was set to -0.75. See S1 Text for details.

(TIF)

Click here for additional data file.^{(988.9KB, tif)}

S3 Fig. Performance of Rosetta-implementation of 3BOP.

Each stacked bar graph represents the CPU time spent performing a packing or design calculation on 1ILZ using the pre-computed interaction graph setting. The red top bar represents time spent applying the penalty rules to rotamers, green bar represents time spent calculating h-bonds between rotamers before the 3BOP algorithm, the orange bar is time spent calculating atomic depth, and the blue bottom bar is runtime of the background packing or design calculation. With better data management, the green bar could be avoided as h-bonds are double calculated here (with the other calculation occuring inside blue). *While 3BOP adds a large runtime penalty here, only 2% of this runtime is spent calculating the actual 3-body interactions. 98% of the runtime is spent later in dictionary lookups during rotamer-pair energy assignment.

(TIF)

Click here for additional data file.^{(405.3KB, tif)}

S1 Text. Data collection for figures.

(PDF)

Click here for additional data file.^{(120.1KB, pdf)}

S1 Scripts. Scripts to produce results.

(TAR)

Click here for additional data file.^{(40KB, tar)}

S1 Data. Results data.

(TAR)

Click here for additional data file.^{(150KB, tar)}

Acknowledgments

We thank Vikram K. Mulligan and Scott Boyken for helpful thoughts and conversations about buried unsaturated polar atoms and the idea of creating an energetic penalty for these.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The authors received no specific funding for this work.

References

1.Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487: 545–574. 10.1016/B978-0-12-381270-4.00019-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins Struct Funct Bioinforma. 1999;35: 133–152. [DOI] [PubMed] [Google Scholar]
3.O’Meara MJ, Leaver-Fay A, Tyka MD, Stein A, Houlihan K, DiMaio F, et al. Combined Covalent-Electrostatic Model of Hydrogen Bonding Improves Structure Prediction with Rosetta. J Chem Theory Comput. 2015;11: 609–622. 10.1021/ct500864r [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Xu D, Zhang Y. Generating Triangulated Macromolecular Surfaces by Euclidean Distance Transform. PLOS ONE. 2009;4: e8140. 10.1371/journal.pone.0008140 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Xu D, Li H, Zhang Y. Protein Depth Calculation and the Use for Improving Accuracy of Protein Fold Recognition. J Comput Biol. 2013;20: 805–816. 10.1089/cmb.2013.0071 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Vijay-Kumar S, Bugg CE, Cook WJ. Structure of ubiquitin refined at 1.8Åresolution. J Mol Biol. 1987;194: 531–544. 10.1016/0022-2836(87)90679-6 [DOI] [PubMed] [Google Scholar]
7.Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26: 689–691. 10.1093/bioinformatics/btq007 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.8. 2015.
9.Buckle AM, Schreiber G, Fersht AR. Protein-protein recognition: Crystal structural analysis of a barnase-barstar complex at 2.0-.ANG. resolution. Biochemistry. 1994;33: 8878–8889. 10.1021/bi00196a004 [DOI] [PubMed] [Google Scholar]
10.Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357: 168–175. 10.1126/science.aan0693 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput. 2017;13: 3031–3048. 10.1021/acs.jctc.7b00125 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Duhovny D, Nussinov R, Wolfson HJ. Efficient Unbound Docking of Rigid Molecules. In: Guigó R, Gusfield D, editors. Algorithms in Bioinformatics. Berlin, Heidelberg: Springer; 2002. pp. 185–200. 10.1007/3-540-45784-4_14 [DOI] [Google Scholar]
13.Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res. 2005;33: W363–W367. 10.1093/nar/gki481 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Maguire J, Haddox H, Strickland D, Halabiya S, Coventry B, Cummins M, et al. Perturbing the energy landscape for improved packing during computational protein design. Preprints; 2020. May. 10.1002/prot.26030 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Snijder HJ, Eerde JHV, Kingma RL, Kalk KH, Dekker N, Egmond MR, et al. Structural investigations of the active-site mutant Asn156Ala of outer membrane phospholipase A: Function of the Asn–His interaction in the catalytic triad. Protein Sci. 2001;10: 1962–1969. 10.1110/ps.17701 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008061.r001

Decision Letter 0

Dina Schneidman-Duhovny

26 Aug 2020

Dear Mr. Coventry,

Thank you very much for submitting your manuscript "Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors present an approximation algorithm to the problem of penalizing the burial of polar atoms when those atoms don't form hydrogen bonds. This problem cannot be readily expressed with terms of a pairwise decomposable energy function. If a polar atom is considered buried, then it can be assigned a penalty, and then hydrogen bonds to that polar atom can be assigned a bonus to compensate for that penalty. However, if multiple hydrogen bonds form to a single atom, they would all get that bonus and instead of simply compensating for the burial of the polar atom, they keep accumulating their bonuses. In the context of protein design, this would produce sequences that form an unrealistic number of hydrogen bonds.

The authors approach this problem by putting oversaturation penalties between pairs of atoms that hydrogen bond to the same third atom. The calculation of this penalty depends on three things at once (three atoms / three rotamers) and is therefore not pairwise decomposable; however, this calculation can occur before the beginning of the sequence optimization step. The oversaturation penalty between the pairs of atoms that hydrogen bond to the same third atom is included in the pair interaction energy of those two atoms and will be included whether or not the third atom is present.

This is a clever approach to a vexing problem, and the authors present some compelling results, but there are a few things that leave the reader unsatisfied.

1. The authors describe their oversaturation penalty as "the algorithm" throughout the text. It needs a name. "Algorithm" is too general a term -- and isn't strictly speaking even the right term for the approach the authors present. "Oversaturation penalty" perhaps? "Three-body oversaturation penalty" (3BOP)? This is a minor point and wouldn't be the first in the list except that it makes it a little harder to write this review. I will refer to the authors approach as "3BOP" for the remainder of this review.

2. The authors don't really tell us what parameters to use for 3BOP and leave out some important details about the experiments that they performed. For example, figures 2A and 2B both include two rows "+ 5*Algorithm" and "+10*Algorithm" without much in the way of an indication what 5 and 10 mean or what values for the three parameters that define 3BOP (b, s, & v) they are using (or whether they are using different values for each of the atom types?). Furthermore, it is not even clear which of the two (5 vs 10) the authors are endorsing as the best. (The authors do send the readers to the supplemental materials in the caption to this figure, but something as pertinent to the understanding of a figure should not required going to the supplement.)

The "complete-protein, polar only" design test presented in Figure 2 is an interesting one. At first glance, it seems silly, since no protein designer would ever design a protein that lacked hydrophobic residues, but at second glance, it seems like a clever way to prevent an algorithm that wants to avoid burying unsatisfied polar atoms from avoiding polar atoms altogether. It is fascinating and confusing that the median number of buried polar atoms per design could decrease from 5 to 2 while the number of hydrogen bonds only increased by two and certainly looks better than the version of the score function that simply increases the strength of hydrogen bonds. However, on third glance, the 3BOP still has an "out" that the polar-only setup doesn't perfectly avoid: it is possible for 3BOP to create fewer buried unsatisfied polar atoms by simply designing fewer polar atoms: it could choose poly serine and do better than the default score function, ref2015, which might try to use larger residues that coincidentally contain more polar atoms. It seems important to report the number of buried polar atoms that the different score functions create.

A better test would be 1-sided interface design and interrogate the number of buried unsatisfied polar atoms on the side held fixed. This test case has the further advantage of being much closer to the purpose where 3BOP is likely to be employed.

3. Figure 3B shows that only a tiny fraction of the total design time is consumed on 3BOP; however, 3BOP relies on the pre-calculation of all pairs of rotamer energies ahead of design, and the standard design algorithm in rosetta is to compute pair energies on the fly. The authors should mention whether 3BOP can be adapted to be used in an on-the-fly energy calculation scheme and, if not, how much of a performance penalty is incurred switching between on-the-fly and pre-calculation.

4. Figure 3A shows a concerning presence of oversaturation penalties between pairs of atoms that are simultaneously hydrogen bonding to the same absent/phantom atom. It seems like there is likely some price that comes with the use of 3BOP, but there's little that the authors present in terms of investigations into what that price would be. Does it impact the probability of seeing certain amino acid pairs at certain CB distances, perhaps?

5. The choice of letters and formalism in the pseudo-code is confusing. Typically a set would be labeled with a capital letter, (e.g. S) and an element of that set would be labeled with a lower case letter (e.g. s). In the pseudo code, capital letter B is used to denote an element of the set of buried polar atoms, and the lower case letter b is used to denote the one-body penalty for a buried polar atom. Perhaps use greek letters for constants: beta, sigma and omega? Furthermore, the "Q1-Q2" or "B-Q" notation looks like subtraction instead of denoting the pair interaction energy between the atoms.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. 2021 Mar 8;17(3):e1008061. doi: 10.1371/journal.pcbi.1008061.r002

Author response to Decision Letter 0

5 Jan 2021

Attachment

Submitted filename: Buried Unsat Letter to Reviewer.pdf

Click here for additional data file.^{(27.3KB, pdf)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008061.r003

Decision Letter 1

Dina Schneidman-Duhovny

12 Feb 2021

Dear Mr. Coventry,

We are pleased to inform you that your manuscript 'Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have satisfied all of my requests with their review.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Reviewer #1: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008061.r004

Acceptance letter

Dina Schneidman-Duhovny

2 Mar 2021

PCOMPBIOL-D-20-01008R1

Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds

Dear Dr Coventry,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Alice Ellingham

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Effect of penalty on protein redesign.

(TIF)

Click here for additional data file.^{(2.8MB, tif)}

S2 Fig

(TIF)

Click here for additional data file.^{(988.9KB, tif)}

S3 Fig. Performance of Rosetta-implementation of 3BOP.

(TIF)

Click here for additional data file.^{(405.3KB, tif)}

S1 Text. Data collection for figures.

(PDF)

Click here for additional data file.^{(120.1KB, pdf)}

S1 Scripts. Scripts to produce results.

(TAR)

Click here for additional data file.^{(40KB, tar)}

S1 Data. Results data.

(TAR)

Click here for additional data file.^{(150KB, tar)}

Attachment

Submitted filename: Buried Unsat Letter to Reviewer.pdf

Click here for additional data file.^{(27.3KB, pdf)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.

[pcbi.1008061.ref001] 1.Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487: 545–574. 10.1016/B978-0-12-381270-4.00019-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref002] 2.Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins Struct Funct Bioinforma. 1999;35: 133–152. [DOI] [PubMed] [Google Scholar]

[pcbi.1008061.ref003] 3.O’Meara MJ, Leaver-Fay A, Tyka MD, Stein A, Houlihan K, DiMaio F, et al. Combined Covalent-Electrostatic Model of Hydrogen Bonding Improves Structure Prediction with Rosetta. J Chem Theory Comput. 2015;11: 609–622. 10.1021/ct500864r [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref004] 4.Xu D, Zhang Y. Generating Triangulated Macromolecular Surfaces by Euclidean Distance Transform. PLOS ONE. 2009;4: e8140. 10.1371/journal.pone.0008140 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref005] 5.Xu D, Li H, Zhang Y. Protein Depth Calculation and the Use for Improving Accuracy of Protein Fold Recognition. J Comput Biol. 2013;20: 805–816. 10.1089/cmb.2013.0071 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref006] 6.Vijay-Kumar S, Bugg CE, Cook WJ. Structure of ubiquitin refined at 1.8Åresolution. J Mol Biol. 1987;194: 531–544. 10.1016/0022-2836(87)90679-6 [DOI] [PubMed] [Google Scholar]

[pcbi.1008061.ref007] 7.Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26: 689–691. 10.1093/bioinformatics/btq007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref008] 8.Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 1.8. 2015.

[pcbi.1008061.ref009] 9.Buckle AM, Schreiber G, Fersht AR. Protein-protein recognition: Crystal structural analysis of a barnase-barstar complex at 2.0-.ANG. resolution. Biochemistry. 1994;33: 8878–8889. 10.1021/bi00196a004 [DOI] [PubMed] [Google Scholar]

[pcbi.1008061.ref010] 10.Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357: 168–175. 10.1126/science.aan0693 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref011] 11.Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput. 2017;13: 3031–3048. 10.1021/acs.jctc.7b00125 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref012] 12.Duhovny D, Nussinov R, Wolfson HJ. Efficient Unbound Docking of Rigid Molecules. In: Guigó R, Gusfield D, editors. Algorithms in Bioinformatics. Berlin, Heidelberg: Springer; 2002. pp. 185–200. 10.1007/3-540-45784-4_14 [DOI] [Google Scholar]

[pcbi.1008061.ref013] 13.Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res. 2005;33: W363–W367. 10.1093/nar/gki481 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref014] 14.Maguire J, Haddox H, Strickland D, Halabiya S, Coventry B, Cummins M, et al. Perturbing the energy landscape for improved packing during computational protein design. Preprints; 2020. May. 10.1002/prot.26030 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1008061.ref015] 15.Snijder HJ, Eerde JHV, Kingma RL, Kalk KH, Dekker N, Egmond MR, et al. Structural investigations of the active-site mutant Asn156Ala of outer membrane phospholipase A: Function of the Asn–His interaction in the catalytic triad. Protein Sci. 2001;10: 1962–1969. 10.1110/ps.17701 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Protein sequence optimization with a pairwise decomposable penalty for buried unsatisfied hydrogen bonds

Brian Coventry

David Baker

Roles

Abstract

Author summary

Introduction

Materials and methods

Burial region calculation

Fig 1.

Penalty calculation

Oversaturation rotamer correction

Results

Buried unsatisfied polar quadratic penalty

Table 1. Example penalty schemes for different atom types.

Fewer buried unsatisfied polar atoms

Fig 2. Effect of 3BOP on de novo interface design.

3BOP is better than simply upweighting h-bonds

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Dina Schneidman-Duhovny

Roles

Author response to Decision Letter 0

Decision Letter 1

Dina Schneidman-Duhovny

Roles

Acceptance letter

Dina Schneidman-Duhovny

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases