Surface residues and nonadditive interactions stabilize a consensus homeodomain protein

Matt Sternke; Katherine W Tripp; Doug Barrick

doi:10.1016/j.bpj.2021.10.035

. 2021 Oct 30;120(23):5267–5278. doi: 10.1016/j.bpj.2021.10.035

Surface residues and nonadditive interactions stabilize a consensus homeodomain protein

Matt Sternke ¹, Katherine W Tripp ¹, Doug Barrick ^1,^∗

PMCID: PMC8715166 PMID: 34757081

Abstract

Despite the widely reported success of consensus design in producing highly stabilized proteins, little is known about the physical mechanisms underlying this stabilization. Here, we explore the potential sources of stabilization by performing a systematic analysis of the 29 substitutions that we previously found to collectively stabilize a consensus homeodomain compared with an extant homeodomain. By separately introducing groups of consensus substitutions that alter or preserve charge state, occur at varying degrees of residue burial, and occur at positions of varying degrees of conservation, we determine the extent to which these three features contribute to the consensus stability enhancement. Surprisingly, we find that the largest total contribution to stability comes from consensus substitutions on the protein surface and that the largest per substitution contributions come from substitutions that maintain charge state. This finding suggests that, although consensus proteins are often enriched in charged residues, consensus stabilization does not result primarily from interactions involving charged residues. Although consensus substitutions at strongly conserved positions also contribute disproportionately to stabilization, significant stabilization is also contributed from substitutions at weakly conserved positions. Furthermore, we find that identical consensus substitutions show larger stabilizing effects when introduced into the consensus background than when introduced into an extant homeodomain, indicating that synergistic, stabilizing interactions among the consensus residues contribute to consensus stability enhancement of the homeodomain. By measuring DNA binding affinity for the same set of variants, we find that, although consensus design of the homeodomain increases both affinity and folding stability, it does so using a largely nonoverlapping set of substitutions.

Significance

Proteins composed of consensus sequences from multiple sequence alignments are often more stable than extant proteins. Often, about half the residues in a consensus protein differ from those of extant proteins. The relative contributions of these many sequence differences to stability are unknown. Here, we substitute groups of residues with different properties (conservation, charge variation, and solvent accessibility) to determine which substitutions lead to consensus stabilization. We find that surface and charge-conserving substitutions contribute to stability, that weakly conserved substitutions make a significant collective contribution to stability, and that there is a significant nonadditive contribution to stability in the consensus background. These results provide insights to the sequence origins of consensus stabilization and the evolutionary constraints that determine protein sequences.

Introduction

Many natural proteins have stabilities ranging from 5 to 15 kcal mol⁻¹ (1). These modest stabilities have been argued to result from weak selective pressure on stability rather than an upper limit (1,2). Indeed, the stabilities of proteins from thermophilic organisms (3) and nonnatural designed proteins (4) significantly exceed this marginal range. Furthermore, natural protein sequences can be stabilized by point substitutions (5,6). Thus, there is great potential to design novel, highly stable proteins as well as stabilize natural protein folds beyond their marginal stabilities.

Designing proteins for high stability, however, has proven to be a significant challenge because most substitutions are destabilizing (5,6). Although a number stabilized proteins have been created using computational and rational design (4,7, 8, 9) as well as laboratory evolution methods (10,11), success rates can be low, and methods are often labor intensive and require considerable expertise. As one indication of the challenges that persist in protein engineering, state-of-the-art computational methods often show limited predictive power in the simple task of correctly classifying point substitutions as stabilizing versus destabilizing (12).

The rapid growth of protein sequence databases over the last decade has enabled sequence-based methods in protein design (13,14). One sequence-based method, consensus sequence design, has shown considerable promise in generating highly stabilized natural protein folds. In consensus design, a consensus protein is generated in which the residue at each position is the most frequent residue in a corresponding multiple sequence alignment (MSA). Consensus design has been shown to stabilize proteins across a number of protein families with different architectures, chain lengths, phylogenetic distributions, and functions in most (but not all) cases (13,15).

Despite the broadly demonstrated success of consensus design in increasing protein stability, little is known of the sequence or structural or physical origins of consensus stabilization. Rather than optimizing protein sequences based on physical principles, consensus design is a bioinformatic approach. The underlying physical basis for consensus stabilization is encoded with the conservation of residues in an MSA. One general sequence feature that distinguishes consensus proteins from natural proteins is an increase in the number of charged residues at the expense of polar uncharged residues (16). Differences between consensus sequences and natural sequences occur most frequently at positions of low conservation and positions on the protein surface (16). Identifying which of these general classes of substitutions gives rise to consensus stabilization may allow for further stabilization and provide a route to achieve maximal stability with a minimal number of substitutions.

Surprisingly, although full consensus proteins are typically more stable than the extant sequences from which they derive, individual consensus substitutions in extant proteins are often destabilizing. Only about half of substitutions toward consensus are stabilizing; the other half are destabilizing or have no effect on stability (13). An additive model in which the effect on stability of the full consensus protein is the sum of effects from the individual substitutions predicts a minimal increase in stability from full consensus substitution because the stabilizing substitutions would be offset by the destabilizing substitutions (17). This discrepancy suggests nonadditive interactions among consensus residues may contribute to consensus stabilization; however, such interactions have not been reported in consensus proteins.

To determine which types of substitutions give rise to consensus stability enhancement and to probe nonadditive stabilization among consensus residues, we performed a mutational analysis using a consensus homeodomain (CHD) that we previously found to be stabilized by∼4.5 kcal mol⁻¹ relative to the well-studied Drosophila melanogaster engrailed homeodomain (EnHD) (18). To explore the contributions to stability from specific sequence and structural features, we introduced sets of consensus substitutions that share a specific feature into EnHD and characterized the thermodynamic stabilities of the resulting variants. These features include residue charge state, degree of side-chain burial, and extent of conservation. In addition, we probed nonadditivity by comparing the effect of identical consensus substitutions (both larger sets of residues and individual residues) in the EnHD and CHD backgrounds and measured affinities of these variants to the engrailed DNA binding sequence. We find that the largest total contributions to stability arise from consensus substitutions on the protein surface, whereas the largest per-residue-stabilizing effects come from substitutions that maintain charge state and substitutions at strongly conserved positions. However, no single feature accounts for the entire consensus stability increment. We also find that stabilizing effects correlate with the number of consensus substitutions introduced in each variant, suggesting that consensus stabilization arises from marginal, incremental effects distributed across each feature. Furthermore, we find that some consensus residue substitutions show larger stabilizing effects when introduced in the CHD background than the EnHD background, suggesting that favorable nonadditive couplings among consensus residues contribute to the significantly enhanced stability of the CHD. As with folding stability, the different groups of substitutions contribute to the increased affinity, with some groups contributing more than others. There is little correlation between the types of residues that make the largest contributions to DNA affinity and those that contribute to folding stability, indicating that, although consensus design of the homeodomain (HD) enhances both folding stability and DNA binding, it does so through different sets of residues.

Materials and methods

Cloning, protein expression, and purification

The sequence of CHD was determined from a Pfam (19) seed alignment containing 182 sequences. Sequences in this alignment have an average pairwise identity of 38%, with minimal and maximal pairwise identities of 14 and 81%. Genes encoding EnHD and CHD in pET24 expression plasmids were described previously (18). Single-residue substitutions were introduced using a QuikChange Site-Directed Mutagenesis Kit (Agilent Technologies, Santa Clara, CA). Genes encoding sequences with multiple substitutions were synthesized by GeneArt (Thermo Fisher Scientific, Waltham, MA) and cloned into the pET24 expression plasmids using a Gibson Assembly Master Mix (New England Biolabs, Ipswich, MA). All constructs contain an N-terminal Met-Gly-Ser and a C-terminal His₆ tag. Proteins were expressed and purified as previously described (18).

Equilibrium guanidine hydrochloride denaturations

Equilibrium guanidine hydrochloride (GdnHCl)-induced folding and unfolding transitions were monitored by circular dichroism (CD) using an Aviv Model 400 Spectropolarimeter (Aviv Biomedical, Lakewood, NJ). All experiments were performed using a Hamilton automated titrator (Hamilton Company, Reno, NV). For each experiment, samples with identical protein concentrations were prepared in a native buffer containing 25 mM NaPO₄ (pH 7.0) and 150 mM NaCl and a denaturing buffer containing 25 mM NaPO₄ (pH 7.0) and 150 mM NaCl with a high concentration of GdnHCl (∼8 M for most experiments). GdnHCl concentrations in denatured protein solutions were determined using refractometry (20). Because the proteins were found to fold and unfold reversibly, we collected denaturation curves in either the forward (titrating in the denaturant) or reverse direction (diluting away the denaturant) to optimize sampling of the folded and unfolded baselines. For proteins of lower stability (for example, EnHD single-residue variants), experiments were performed in the forward direction to ensure adequate sampling of the native baseline. For proteins of higher stability (for example, CHD single-residue variants), experiments were performed in the reverse direction to ensure adequate sampling of the denatured baseline. For a subset of proteins, denaturation experiments were performed in both the forward and reverse directions to verify that the same conformational transitions were obtained for the two directions.

For each denaturation, titrant solution (protein in denaturant for the forward direction; protein in buffer for the reverse direction) was injected to a sample solution in a 1-cm-pathlength cuvette (protein in buffer for the forward direction; protein in denaturant for the reverse direction). Samples were allowed to equilibrate for 5 min after each titrant injection. After equilibration, the CD signal at 222 nm was averaged for 30 s at each GdnHCl concentration. Protein concentrations ranged from 2 to 12 μM. All titrations were performed at 20°C and collected in triplicate for each variant.

Equilibrium folding free energies in the absence of denaturant (ΔG°_H2O) and denaturant sensitivity coefficients (m-values) were determined using a two-state linear extrapolation model (21). Because the denaturation midpoint (C_m) values of the most stable variants are high, determining ΔG°_H2O-values requires a long extrapolation, which amplifies uncertainties that result from correlation between ΔG°_H2O- and m-values. To minimize these uncertainties (22), we fit all unfolding curves for all variants to a global model (see Fig. S1 for a comparison of the global fits to local fits). In the global model, each unfolding curve was fit with local folded and unfolded baselines and a local ΔG°_H2O parameter, and a single m-value parameter was shared among all curves and all variants. Before fitting, the CD signals were normalized such that the maximal signal intensity in each curve has a value of 1 and the minimal signal intensity has a value of 0 according to the following:

Y_{n o r m, i} = \frac{Y_{m a x} - Y_{o b s, i}}{Y_{m a x} - Y_{m i n}},

(1)

where Y_norm,i is the normalized signal of the ith data point in the curve, Y_max and Y_min are the maximal and minimal signal intensities, respectively, in the curve, and Y_obs,i is the signal intensity of the ith data point. For each variant, ΔG°_H2O-values were averaged over the three replicate unfolding curves, and uncertainties were determined as the standard errors of the mean. The difference in the stability of two variants was determined as the differences in folding free energies (ΔΔG°_H2O) between the two variants. Uncertainties in ΔΔG°_H2O-values were propagated from the uncertainties in ΔG°_H2O-values according to the following:

δ_{Δ Δ G_{i, j}} = \sqrt{δ_{Δ G_{i}}^{2} + δ_{Δ G_{j}}^{2}},

(2)

where $δ_{Δ Δ G_{i, j}}$ is the uncertainty in the ΔΔG°_H2O-value for variant i and variant j, and $δ_{Δ G_{i}}^{2}$ $δ_{Δ G_{j}}^{2}$ are the squared uncertainties in the ΔG°_H2O-values of variant i and variant j, respectively.

For EnHD, GdnHCl-unfolding data are from a previous study (18). For CHD, we performed GdnHCl denaturations in the reverse direction to thoroughly sample the unfolded protein baseline and to maintain consistency with data collected here for CHD variants. The free energies and m-values for EnHD and CHD reported here are determined from the global model and differ slightly from previously reported values (18).

DNA binding measurements

DNA affinities were determined using isothermal titration calorimetry (ITC). ITC measurements were made as described in (18). ITC titrations were performed on a VP-ITC from MicroCal (Northhampton, MA). The DNA sequence 5′-CGACTAATTAGTGC-3′ and its complement were purchased from IDT. These oligonucleotides were annealed as previously described (18). Before titration, both protein and DNA samples were dialyzed overnight into 25 mM NaPO₄ (pH 7.0) and 250 mM NaCl. High salt was used to decrease binding affinities to a range where they could be measured accurately. Titrations were done at 20°C, with protein concentrations ranging from 50 to 300 μM, depending on the affinity of the complex. The DNA concentrations were 10 times lower than the protein (5–30 μM). Titrations were set up as previously described (18). Peak integration was done using NITPIC (23). Integrated data were fitted to a single-site model using SEDPHAT (24).

Residue solvent-accessible surface area calculations

Residue-specific side-chain solvent-accessible surface areas (SASA) were calculated using the EnHD crystal structure (Protein Data Bank: 1ENH) and were compared with the average SASA of the respective side chain in an ensemble of Gly-X-Gly tripeptides using GETAREA (25). Residues showing a relative side-chain SASA less than 20% were classified as buried, residues showing a relative side-chain SASA between 20 and 50% were classified as intermediate, and residues showing a relative side-chain SASA greater than 50% were classified as surface positions. Residues D1, K2, K57, and K58 are not present in the EnHD crystal structure but were classified as surface positions because they are the N- and C-terminal residues of the EnHD sequence. Two positions classified as buried contain charged resides in EnHD (E19 and K52). We reclassified these two positions with the intermediate group because both residues are right below the 20% side-chain SASA (18.4 and 18.1%, respectively; Table S2) and it is unlikely that the charged residues at these positions are completely buried.

Analysis of feature overlaps

The overlap between features is determined as the number of substitutions shared between two features. The expected overlap (E) between features if the features are uncorrelated can be determined as the mean of a hypergeometric distribution (sampling substitutions without replacement) determined as follows:

E = \frac{n \times K}{N},

(3)

where n is the number of substitutions comprising feature 1, K is the number of substitutions comprising feature 2, and N is the total number of substitutions that are sampled (29 for all cases in the study).

Results

Substitutions by residue charge state

To explore whether consensus stabilization results from increases in the number of charged residues, we designed HD variants that combine substitutions that contribute to the charge differences between EnHD and CHD. In a study of seven consensus proteins, we found that consensus sequences often contain a higher percentage of charged residues than their naturally occurring counterparts (16). This enhancement in charged residues is quite pronounced for CHD; CHD is made up of 50% charged residues, whereas EnHD is made up of 40% charged residues (Fig. 1 A), and HDs, on average, are made up of∼30% charged residues (16). Of the 29 sequence differences between EnHD and CHD, 19 differences alter charge state (dark purple, Fig. 1, A and B). Nine of these 19 differences increase the net positive charge of CHD (negatively charged residues to a neutral or positively charged residue, or neutral residue to a positively charged residue). The other 10 differences decrease the positive charge of CHD. As a result of these 19 substitutions, the number of charged residues increases by six (29 out of 57 residues are charged in CHD), although the overall positive net charge decreases from +11 for EnHD to +5 in CHD. Of the 10 sequence differences that preserve the charge state, eight involve uncharged residues in both proteins; the other two substitute lysines with arginines (lavender, Fig. 1, A and B).

Consensus substitutions grouped by change in residue charge state. (A) Alignment of EnHD and CHD sequences. Sequence differences that change the charge state are shown in dark purple; differences that maintain the charge state are shown in lavender. Locations of α-helices are shown above the sequences. (B) Sequence differences mapped onto the EnHD structure (Protein Data Bank: 2JWT). Residues that differ between CHD and EnHD are shown with Cα atoms as spheres and are colored as in (A). (C) Representative GdnHCl-induced unfolding transitions for EnHD, EnHD CS19, EnHD CM10, and CHD (colored as in D). CD-values are normalized to span from 0 to 1. Curves are two-state fits (Eq. 1). Here and in all subsequent figures, data for EnHD (*black*) are from a previous study (18). (D) Effects on folding free energies relative to EnHD. Errors bars are uncertainties determined by Eq. 2. (E) Effects on folding free energies relative to EnHD normalized for the number of substitutions (ΔN). Error bars are uncertainties from (D) divided by ΔN. To see this figure in color, go online.

To examine the role of these charged residue differences in consensus stabilization, we designed a variant that introduces the consensus residues at the 19 positions that differ in charge state into EnHD, which we refer to as EnHD “charge state swapping” (CS19). We also made the complement to this set of substitutions, which differs from EnHD at the 10 positions that maintain the charge state, which we refer to as EnHD “charge state maintaining” (CM10). EnHD CS19 and CM10 have net charges of +6 and +8, respectively.

EnHD CS19 and EnHD CM10 are both stabilized relative to EnHD (Fig. 1 C) by similar amounts. Compared with EnHD, EnHD CS19 and EnHD CM10 have ΔΔG°_H2O-values of −2.15 ± 0.02 and −2.30 ± 0.02 kcal mol⁻¹, respectively (Fig. 1 D; Table S1). Thus, neither the charge-state-swapping nor the charge-state-maintaining substitutions achieve the full 4.27 ± 0.02 kcal mol⁻¹ increase in stability for all 29 consensus substitutions (Fig. 1 D). However, when stability increases are normalized for the number of substitutions in each set, the charge-state-maintaining substitutions provide a greater stability enhancement per substitution than the charge-state-swapping substitutions. The 10 charge-state-maintaining substitutions provide −0.23 ± 0.01 kcal mol⁻¹ per substitution, whereas the 19 charge-state-swapping substitutions provide an average of −0.11 ± 0.01 kcal mol⁻¹ per substitution (Fig. 1 E). We note that the stabilization afforded by the charge-swapping substitutions in not likely to arise from long-range coulombic interactions between charged residues because stabilities are determined at very high ionic strength (2–6 molar guanidinium chloride). Even on the low end of this ionic strength range, long-range coulombic interactions are likely to be screened, as evidenced by the insensitivity of the guanidine-induced unfolding transitions of EnHD to sodium chloride concentration (Fig. S2).

Substitutions by side-chain burial

To determine the extent to which consensus stabilization results from substitution of buried versus solvent-exposed residues, we designed HD variants that combine substitutions at positions with similar degrees of side-chain burial. The 29 sequence differences between EnHD and CHD occur at 19 surface positions, seven intermediate positions, and three buried positions (Fig. 2, A and B; Table S2).

Consensus substitutions grouped by residue solvent accessibility in EnHD. (A) Alignment of EnHD and CHD sequences. Sequence differences at surface, intermediate, and buried sites are shown in blue, green, and yellow, respectively. (B) Sequence differences mapped onto the EnHD structure. Residues that differ between EnHD and CHD are shown with Cα atoms as spheres and are colored as in (A). (C) Representative GdnHCl-induced unfolding transitions for EnHD, EnHD B3, EnHD BI10, EnHD S19, EnHD SI26, and CHD. CD-values are normalized to span from 0 to 1. Curves are two-state fits (Eq. 1). Constructs are colored as in (D). (D) Effects on folding free energies relative to EnHD. Error bars are uncertainties determined by Eq. 2. (E) Effects on folding free energies relative to EnHD normalized for the number of substitutions (ΔN). Error bars are uncertainties from (D) divided by ΔN. To see this figure in color, go online.

We designed four variants that each introduce consensus residues with similar extents of burial into the EnHD background. EnHD “buried” consensus (B3) contains the three buried consensus substitutions, EnHD “buried and intermediate” (BI10) contains the 10 buried/intermediate consensus substitutions, EnHD “surface” consensus (S19) contains the 19 surface consensus substitutions, and EnHD “surface and intermediate” (SI26) contains the 26 surface and intermediate substitutions (Fig. 2 C). Relative to EnHD, all four variants are stabilized. The buried and intermediate consensus substitutions show the smallest stability enhancement; the folding free energies of the EnHD B3 and EnHD BI10 variants are decreased by −0.39 ± 0.01 and −1.32 ± 0.01 kcal mol⁻¹ compared with EnHD (Fig. 2 D; Table S1). In contrast, the surface consensus substitutions impart a greater stability enhancement; the folding free energy of the EnHD S19 variant is decreased by −3.46 ± 0.02 kcal mol⁻¹ (Fig. 2 D; Table S1). The stabilizing contributions from the surface consensus substitutions are further underscored when stability increases are normalized for the number of substitutions in each set. Of the four variants that probe burial and accessibility, the surface consensus substitutions show the largest stability enhancement, of 0.18 ± 0.01 kcal mol⁻¹ per substitution, whereas the buried and buried and intermediately-exposed consensus substitutions each contribute 0.13 kcal mol⁻¹ per substitution (Fig. 2 E).

Substitutions by positional conservation

To determine the contributions of residue conservation to the consensus stability enhancement, we made variants that combine residues with different degrees of conservation. Conservation at a position i in an MSA was quantified using sequence information (SI_i), calculated using the following formula:

S I_{i} = \log_{2} 20 - \sum_{j \in {A l a, C y s, \dots, T r p}} f_{i, j} \log_{2} f_{i, j},

(4)

where f_i,j represents the frequency of residue j at position i. SI_i is the increase in entropy in going from the observed residue frequency distribution at position i (the sum term in the Eq. 4) to a random distribution, where all 20 residues have equal frequencies of 0.05. Positions with high SI are strongly conserved, whereas positions with low SI are weakly conserved; the maximal and minimal values of SI with this random distribution is 4.32 and 0 bits, respectively.

We calculated SI for at each position of the HD from an MSA composed of 4571 sequences (see Supporting materials and methods). HD positions show a broad range of conservation, with values of SI ranging from 0.44 to 3.98, with a median value of 1.58 (Fig. 3 C). Positions with the same residues in EnHD and CHD (residues with black background in Fig. 3 A, black bars in Fig. 3 C) are on average more highly conserved (median SI of 2.12 bits) than positions that differ (residues with gray or colored backgrounds in Fig. 3 A, gray or colored bars in Fig. 3 C; median SI of 0.92 bits). However, not all positions that differ between EnHD and CHD have low SI-values. Seven such positions have SI-values greater than the median sequence information across all positions. To the extent that effects on stability correlate with residue frequencies, consensus substitutions at positions with high SI-values may be expected to show the greatest increases in stability.

Consensus substitutions grouped by positional conservation. (A) Alignment of EnHD and CHD sequences. Residues with black backgrounds are identical in EnHD and CHD. Residues with gray background are the group of 21 differences at weakly conserved positions. Residues with other colors are the group of eight differences at strongly conserved positions. (B) Sequence differences mapped onto the EnHD structure. Cα atoms are shown as spheres and are colored as in (A). (C) Sequence information (Eq. 4) at each position from an alignment of 4571 HD sequences. Bars are colored as in (A). (D) Representative GdnHCl-induced unfolding transitions for EnHD SC8 and EnHD WC21. (E) Effects of multiple residue variants on folding free energies relative to EnHD. (F) Effects on folding free energy changes relative to EnHD normalized for the number of substitutions (ΔN). Red and black transitions (D and G) are for EnHD and CHD. (G) Representative GdnHCl-induced unfolding transitions for EnHD single-residue variants (colored as in A). (H) Effects of single-residue substitutions in EnHD on folding free energies. Error bars in (E and H) are uncertainties determined by Eq. 2. Error bars in (F) are uncertainties from (E) divided by ΔN. To see this figure in color, go online.

To determine the contributions to stability from residues at positions of high conservation, we designed a variant that combines consensus substitutions at eight of the most strongly conserved positions into the EnHD background, which we refer to as EnHD “strongly conserved” (SC8). We also made the complementary variant, which introduces the consensus substitutions at the remaining 21 “weakly conserved” positions, which we refer to as EnHD WC21.

EnHD SC8 and EnHD WC21 both show unfolding transitions intermediate between EnHD and CHD (Fig. 3 D). EnHD SC8 and EnHD WC21 show decreases in folding free energies relative to EnHD of −1.86 ± 0.10 and −2.82 ± 0.06 kcal mol⁻¹, respectively, indicating that both sets of substitutions are stabilizing, with the weakly conserved substitutions being slightly more stabilizing (Fig. 3 E; Table S1). When these free energy changes are normalized for the number of substitutions, the strongly conserved substitutions decrease the folding free energy by −0.23 ± 0.01 kcal mol⁻¹ per substitution, whereas the weakly conserved substitutions decrease the folding free energy decrement by −0.13 ± 0.01 kcal mol⁻¹ per substitution (Fig. 3 F). Although it is not unexpected that consensus substitutions at positions of high conservation contribute more to stability on average than at positions of low conservation, the larger number of sequence differences at weakly conserved positions means that a considerable stabilization is provided by consensus substitution at weakly conserved positions.

To further explore the relationship between conservation and stability, we measured the stabilities of individual substitutions from the eight strongly conserved positions in the EnHD background (Fig. 3 G; Table S1). Four of the eight strongly conserved substitutions are stabilizing, three are destabilizing (although the destabilizing effect of I47V is quite small), and one has no effect on stability. On the whole, changes in folding free energies for these eight conserved point substitutions are small; ΔΔG°_H2O-values from EnHD range from −0.93 ± 0.01 to +0.52 ± 0.05 kcal mol⁻¹ with a mean value of −0.24 kcal mol⁻¹ (Fig. 3 H; Table S1), clearly demonstrating that the consensus stability enhancement (−4.27 ± 0.02 kcal mol⁻¹) is not gained through a small number of substitutions at conserved positions.

Nonadditive effects on stability of consensus substitutions

The full consensus HD has 29 sequence differences from D. melanogaster engrailed HD, which is half of the residues in the protein. Thus, although CHD and EnHD have similar tertiary structures (18), they are quite different in sequence. Many of these differences involve residues that are in direct contact (Table S3). Residues in direct contact often make nonadditive contributions to protein stability (26). To determine whether nonadditivity contributes to consensus stability enhancement, we made individual residue substitutions at the eight strongly conserved sites in the CHD background (Fig. 4 A) so that we could compare the stability changes to those measured in the EnHD background (Fig. 3 G). For example, we made the E17K substitution in the CHD background to compare with the K17E substitution made in the EnHD background, thereby allowing us to test the effects of the consensus versus nonconsensus background on stabilization from individual consensus substitutions.

Nonadditive effects of consensus substitutions on stability. (A) Representative GdnHCl-induced unfolding transitions for CHD single-residue variants along with EnHD (*black*) and CHD (*red*). Unfolding transitions are colored as in (B). (B) Correlation of effects on folding free energies of single-residue substitutions toward the consensus in EnHD background (x axis) and CHD background (y axis). Dashed gray line shows the y = x relationship. ΔΔG°_H2O-values are determined from a global fit with a common m-value (see text). Error bars are uncertainties determined by Eq. 2. (C) Additivities of the folding free energies of the eight highly conserved residue substitutions in the EnHD and CHD backgrounds. White bars are the sum of ΔΔG°_H2O-values for the eight single-residue variants in the EnHD (*left*) and CHD (*right*) backgrounds. Error bars are uncertainties propagated from ΔΔG°_H2O-values of the single-residue variants. Gray bars are ΔΔG°_H2O-values when all eight substitutions are made simultaneously in the EnHD and CHD backgrounds; error bars are uncertainties determined by Eq. 2. To see this figure in color, go online.

To compare the effects of substitution on the folding free energies in the EnHD and CHD backgrounds, we determined ΔΔG°_H2O-values in each background for substitution toward the consensus sequence by the following:

Δ Δ G °_{H 2 O, EnHD} \equiv Δ G °_{H 2 O, EnHD variant} - Δ G °_{H 2 O, EnHD}

(5A)

and

Δ Δ G °_{H 2 O, CHD} \equiv Δ G °_{H 2 O, CHD} - Δ G °_{H 2 O, CHD variant},

(5B)

where ΔΔG°_{H2O, EnHD} is the effect of the substitution (toward consensus) in the EnHD background and ΔΔG°_{H2O, CHD} is the effect of the substitution (toward consensus) in the CHD background. With these definitions, substitutions that introduce consensus residues that are stabilizing have negative ΔΔG°_H2O.

In the CHD background, six substitutions toward consensus residue have negative ΔΔG°_H2O-values (although the stabilizing effect of the I45V substitution is quite small, −0.15 ± 0.04 kcal mol⁻¹), whereas only four are stabilizing in the EnHD background (Fig. 4, A and B). Thus, two consensus substitutions (L26P and I45V) that are destabilizing in the EnHD background become destabilizing in the CHD background. Furthermore, whereas the A35S is stabilizing by 0.93 ± 0.01 kcal mol⁻¹ in the EnHD background, it is stabilizing by 1.33 ± 0.01 kcal mol⁻¹ in the CHD background. Thus, the consensus background enhances the stabilizing effects of these three consensus substitutions. This nonadditivity is not the result of using a shared m-value in fits using a global model; stabilization in the consensus background is even larger when determined using the free energies from the local fits (Fig. S3).

Nonadditive effects of consensus residue substitutions can also be seen by comparing stability enhancement of the combined SC8 substitution with the sum from single-residue substitutions in the EnHD and CHD backgrounds. The sum of ΔΔG°_H2O-values for the eight single-residue substitutions in the EnHD background (−1.95 ± 0.08 kcal mol⁻¹) is nearly the same as the ΔΔG°_H2O-value for EnHD SC8 relative to EnHD (−1.86 ± 0.10 kcal mol⁻¹), closely approximating additivity (Fig. 4 C). In contrast, the sum of ΔΔG°_H2O-values for the eight single-residue substitutions in the CHD background (−3.59 ± 0.07 kcal mol⁻¹) is larger than the ΔΔG°_H2O-value for CHD relative to EnHD WC21 (−1.44 ± 0.06 kcal mol⁻¹; Fig. 4 C), demonstrating clear nonadditivity. Thus, on average, consensus substitutions are more stabilizing in the full consensus background than they are in a protein that is eight substitutions away from the consensus. Again, this nonadditivity is not the result of fitting with a shared m-value because nonadditivity in the CHD background is even larger when free energies from local fits (Table S1) are used.

DNA binding affinities of consensus variants

Using ITC, we previously found that the affinity of CHD for its cognate DNA recognition sequence (5′-TAATTA-3′) was increased by∼200-fold compared with EnHD (18). Here, we have measured DNA affinities of the variants containing different groups of substitutions (Fig. S4). Binding free energies for all variants are increased relative to EnHD (Fig. 5 A; Table S4), but none have an affinity as high as CHD. The variant with consensus substitutions at the 21 least conserved sites shows the largest increase in affinity. Those with 19 charge-swapping and eight strongly conserved substitutions show the smallest increases in affinity (Fig. 5 A); note that these two substitutions sets (CS19 and SC8) have little overlap (only one residue in common out of 27 residues combined, Fig. 6 A). This suggests that significant affinity enhancement should result from the complement of these two sets. This complement comprises only three substitutions that are both charge maintaining and weakly conserved, namely, an A7T substitution, a T27S substitution, and an N41T substitution, the first of which is at the protein-DNA interface in a crystal structure of the EnHD bound to its cognate DNA sequence (27). In terms of contributions to binding free energy per residue, the buried and intermediate and charge-maintaining substitutions make the largest contribution to DNA affinity; together, these two variants contain 15 substitutions, five of which are common to both, including I47V, which makes direct contacts with the crystal structure.

Contributions of different types of residues to DNA affinity compared with protein stability. (A) Total free energy changes of DNA binding relative to EnHD for different types of substitutions relative to free energy changes for protein folding. Negative values correspond to increased affinity and folding stability. (B) Free energy changes for DNA binding and folding, normalized to the number of substitutions for each variant. To see this figure in color, go online.

Stability contributions of substitutions of residues in different sequence and/or structure classes. (A) Contingency tables for the overlap of the residue substitutions for each feature. The top number indicates the number of substitutions shared by the two features. The bottom number in parentheses indicates the expected shared number of substitutions if the features were uncorrelated as determined by Eq. 3. The Matthews correlation coefficient (φ) between features is given below each table. The p-value of feature correlation is determined by Fisher’s exact test. (B) Correlation of effects on folding free energies with the number of substitutions encompassed for each feature. Errors in ΔΔG^o_H2O values are smaller than the size of the symbols. (C) Total effects and (D) per substitution effects on stability from consensus substitutions into EnHD for each feature. Error bars in (B and C) are uncertainties determined by Eq. 2. Error bars in (D) are uncertainties from (C) divided by the number of substitutions for each variant. To see this figure in color, go online.

Discussion

The large enhancement in stability of CHD relative to EnHD must arise from the 29 residue differences between the two proteins. However, it is unclear how this stability enhancement is partitioned among the 29 substitutions. Do a small subset of substitutions contribute the bulk of the stability enhancement? Are there particular types of substitutions that contribute the bulk of the stability enhancement? Are these contributions additive? By introducing sets of residue substitutions with shared sequence and/or structural features, we can assess and rank the contributions of these features to the consensus stability enhancement.

Overlap between groups of substitutions

When comparing changes in folding free energies for the different sets of substitutions, the degree of overlap between different groups needs to be considered. Because complementary groups of substitutions (for example, charge swapping and charge maintaining) together comprise all 29 sequence differences between EnHD and CHD, overlap between substitution groups that probe different features is unavoidable. The strongest overlap is between the substitutions that probe charge swapping and those that probe conservation; 18 of the 19 charge-swapping substitutions are also in the weakly conserved group of 21 substitutions (Fig. 5 A). Indeed, the CS19 and WC21 variants differ by only four substitutions (Fig. 5 A; Table S5). This is partly a result of the large number of substitutions in each group (19,21); from a hypergeometric probability distribution, we would expect an overlap of∼14 substitutions if these two features sorted independently (see Materials and methods). Likewise, there is considerable overlap between strongly conserved and charge-maintaining substitutions; seven out of eight strongly conserved substitutions are charge maintaining, whereas a four-residue overlap would be expected from independent assortment. Together, these overlaps produce a Matthews correlation coefficient (φ) of 0.68 for the charge versus conservation groups.

For variants that probe solvent exposure (BI10 and S19) and conservation (SC8 and WC21), overlaps are closer to what would be expected from independent assortment (Fig. 5 A). Correspondingly, the Matthews correlation coefficient among these two groups is lower (0.36). The largest overlap among these groups is that 16 of the 19 surface substitutions are in the weakly conserved group of 21 substitutions. Although this overlap is only two residues in excess of what would be predicted from random assortment, it does mean that these two groups are reporting on many of the same substitutions. Thus, differences in stability between S19 and WC21 come from a small group of substitutions that probe one but not the other feature. Variants that probe solvent exposure and charge state have even less overlap, with a Matthews correlation coefficient of 0.24.

Comparison of effects of different sequence and structural features on folding stability

Across the three features we examined (residue charge state, residue burial, and conservation), all sets of consensus substitutions were found to be stabilizing. Thus, stabilizing effects from consensus substitutions do not appear to be limited to a specific feature. For all variants we tested, we see a strong positive correlation between the extent of stabilization and the number of substitutions made (Fig. 5 B). This suggests that consensus stabilization arises from small, incremental effects from substitutions regardless of features and that a simple way to gain additional stability is to include more consensus substitutions. However, this correlation is not absolute. For example, although both the charge-state-swapping substitutions and the surface substitutions each contain 19 substitutions, the surface substitutions increase folding stability by 1.3 kcal mol⁻¹ more than the charge-state-swapping substitutions (Fig. 5 C). This stability increment must result from the 10 sequence differences between these two variants, either from stabilization by the five surface substitutions that are charge maintaining or destabilization from substitutions that are charge swapping but are not on the surface.

In terms of per residue effects, the largest stabilization is seen for the charge-maintaining and strongly conserved substitutions (−0.23 kcal mol⁻¹ residue⁻¹ for each group; Fig. 5 D), two substitution sets that are strongly overlapping. This indicates that if a high-stability increment is to be achieved from a limited number of consensus substitutions, targeting conserved sites that maintain charge seems a good bet. In terms of total effects, the largest stabilization is seen for the surface (S19) and weakly conserved (WC21) groups (ΔΔG°_H2O = −3.46 and −2.82 kcal mol⁻¹, respectively), demonstrating that a large portion of consensus stabilization comes from what seems to be an unlikely source: residues that are minimally constrained by structure and/or evolution. The large overall contribution from the weakly conserved substitutions (Fig. 5 C) in this study indicates that even modest sequence biases reflect differences in stability, an observation that is consistent with findings from Swint-Kruse and co-workers that weakly conserved surface residues are important for protein function (28,29).

Makhatadze’s group has also identified stabilizing substitutions on the surfaces of acyl phosphatase and Cdc42 GTPase (30). However, unlike the stabilizing CM10 substitutions here, the stabilizing substitutions identified by the Makhatadze group are, by design, charge substitutions (charge reversals and introduction of charges at neutral sites). Also unlike the stabilizing substitutions here, those of Makhatadze and co-workers are generally substitutions away from the consensus. Makhatadze suggested this may result from the dependence of electrostatic interactions on pairs of charged residues because pairwise correlations are poorly represented in consensus sequences. This may explain why the charge-swapping substitutions in this study are less stabilizing than the uncharged substitutions, although it should be remembered that the consensus stability increment is manifested in high concentrations of guanidinium chloride and thus does not result from long-range coulombic interactions among charge groups engineered by Makhatadze and co-workers. It is possible that if the stability of CS19 and CM10 could be measured at low ionic strength and compared with engrailed, a larger stabilization would be observed for the former, although this stabilization would be in excess of that examined here.

Comparison of complementary sets of substitutions within a single feature provides a measure of stability that is by definition free from overlap. Per residue, greater stabilization is afforded from consensus substitutions at strongly conserved sites than from substitutions at weakly conserved sites (ΔΔG°_H2O/ΔN = −0.23 vs. −0.13 kcal mol⁻¹ residue⁻¹), from surface consensus substitutions rather than buried and intermediately-exposed substitutions (ΔΔG°_H2O/ΔN = −0.18 vs. −0.13 kcal mol⁻¹ residue⁻¹), and from charge-maintaining consensus substitutions than from charge-swapping substitutions (ΔΔG°_H2O/ΔN = −0.23 vs. −0.11 kcal mol⁻¹ residue⁻¹). This latter observation suggests that, even though consensus proteins are enriched in ionizable residues (16), these substitutions seem not to be the dominant source of increased folding stability.

Comparison of changes to folding stability and DNA binding

CHD is not only considerably more stable than the EnHD, it binds DNA considerably more tightly. All groups of substitutions examined here increase folding stability and binding. However, they do so to different extents. Fig. 5 shows that, excluding EnHD and CHD, there is little correlation between the folding and binding free energy increments provided by the different substitution sets. Per residue, the strongly conserved substitutions produce a large increase in stability but have an intermediate effect on binding; conversely, the buried and interfacial residues produce a large increase in binding affinity but have an intermediate effect on folding stability. The 10 charge-maintaining substitutions are an exception, increasing both folding stability and binding affinity. Overall, the comparison in Fig. 5 indicates that, although consensus design increases both binding affinity and folding stability, it does so using different mechanisms and residues.

Contribution of nonadditivity to consensus stabilization

A comparison of identical consensus substitutions in the EnHD and CHD backgrounds reveals favorable couplings among residues in the consensus background compared with the EnHD background. Whereas five out of eight single-residue substitutions (K2R, S9T, K17E, I47V, and K52R) show similar effects on stability in both backgrounds, three substitutions (L26P, S35A, and I45V) show a larger stabilizing effect when introduced into the CHD background than the EnHD background (Fig. 4 B). Moreover, two of these background-dependent substitutions (L26P and I45V) are destabilizing in the EnHD background but become stabilizing (albeit marginally so) in the CHD background (Fig. 4 B). This indicates mutual stabilization between the each of the L26P, S35A, and I45V substitutions with the other 28 substitutions that define the consensus HD background.

Formally, the background dependence of the ΔΔG°_H2O-values for the L26P, S35A, and I45V substitutions represents the coupling of these residues to the entire EnHD and CHD backgrounds, which differ by the remaining 28 mismatching positions. We cannot identify the specific interactions that give rise to this background dependence, although nonadditive effects of substitutions typically occur between residues in direct contact (in some cases, longer range interactions can result from electrostatic effects or structural perturbations caused by the substitutions) (26). Among the eight individual substitutions we tested, the residues at the three positions showing nonadditivity make either two or three side-chain-side-chain contacts with other residues in the EnHD structure (Table S3). The residues at the other five positions make, at most, one side-chain-side-chain contact. Furthermore, the residues at the three positions showing nonadditivity make contacts with one or both of the other positions showing nonadditivity. Interestingly, all three background-dependent substitutions are the three buried substitutions in the protein core (Figs. 2 A and 4 B). This suggests a potential mechanism underlying consensus protein stability enhancement involving favorable couplings among moderately conserved groups of residues in the protein core that are absent, on average, in the core of a natural proteins.

Three previous studies have explored nonadditivity in consensus stabilization. Fersht and co-workers found that p53 DNA binding domain variants containing five different combinations of the same four consensus substitutions showed additive effects (17). However, these four substitutions chosen were at positions distant in structure (ranging from 11 to 24 Å away), thus additive effects among the substitutions may have been expected (26). Likewise, in a separate study, Fersht and co-workers found that consensus substitutions in two variants containing six consensus substitutions in Escherichia coli GroEL minichaperones were additive; notably, some of these substitutions were close in space (31). In contrast, Magliery and co-workers found that consensus substitutions in a triosephosphate isomerase (TIM) were nonadditive, albeit in the opposite direction we have observed here: combining 13 consensus substitutions (plus an additional nonconsensus substitution) that were all individually stabilizing resulted a variant that was slightly destabilized relative to the wild-type TIM, suggesting that consensus substitutions synergistically destabilize one another (32).

In addition to the coupling of individual substitutions to the CHD versus EnHD background, we see clear nonadditivity among the eight consensus substitutions in the CHD background. When the eight strongly conserved consensus substitutions are introduced individually into in the full consensus background, the sum of the stability enhancements exceeds that obtained when all eight substitutions are made simultaneously (comparing CHD with WC21; Fig. 4 C). In this format, the single substitutions in the consensus background can each be thought of as a "last substitution,” in which all other nonconsensus residues have been replaced with consensus residues. This observation indicates that the eight strongly conserved substitutions mutually stabilize one another in the consensus background. The observation that the same eight substitutions appear to be additive in the EnHD environment (comparing the sum of the single substitutions SC8 minus EnHD; Fig. 4 C) indicates that the synergy among the eight strongly conserved residues also requires consensus residues at the 21 positions of lower conservation, indicating that nonadditivity includes "higher-order" couplings involving three or more residues. Furthermore, the observation that individual consensus substitutions are more stabilizing in the consensus background than in the EnHD background suggests that making individual consensus substitutions in a naturally occurring sequence may not provide much stabilization, as has been seen in a number of studies (15,17,31,33).

Superficially, the finding of stabilizing energetic couplings between residues in a consensus protein is somewhat surprising because the consensus design method selects residues at each position without regard for sequence at any other position. How are these energetic couplings captured in a consensus sequence? If stabilizing energetic couplings are selected by evolution, the energetic couplings may give rise to positive sequence covariance between favorably coupled residues. Although covariance information is not explicitly used in consensus design, positive covariance should enhance frequencies for both covariant residues. Thus, positive covariances may be implicitly encoded within the conservation of residues at a single position, and consensus sequences may capture these covariant residues. However, the extent to which sequence covariance reflects stabilizing interactions remains an open question. Ranganathan and co-workers have demonstrated that including residues that coevolve is necessary for proteins to adopt their native folds (34). However, the results from Magliery and co-workers described above demonstrate that excluding residues that coevolve improves the identification of stabilizing substitutions (32). The role of sequence covariance in stabilizing (or destabilizing) consensus-based sequences is a topic worthy of further study.

Author contributions

M.S., K.W.T., and D.B. designed the research. M.S. and K.W.T. performed the research. M.S., K.W.T., and D.B. analyzed data. M.S., K.W.T., and D.B. wrote the manuscript.

Acknowledgments

The authors thank the Johns Hopkins University Center for Molecular Biophysics for providing facilities, instrumentation, and resources.

This work was supported by National Institutes of Health (NIH)/National Institute of General Medical Sciences (NIGMS) research grant R01 GM068462 to D.B., NIH/NIGMS training grant T32 GM008403 for M.S., and NIH/NIGMS fellowship F31 GM128295 to M.S.

Editor: Elizabeth Rhoades.

Footnotes

Matt Sternke and Katherine W. Tripp contributed equally to this work.

Supporting material can be found online at https://doi.org/10.1016/j.bpj.2021.10.035.

Supporting citations

References (35, 36) appear in the Supporting materials and methods

Supporting material

Document S1. Supporting materials and methods, Figs. S1–S4, and Tables S1–S5

mmc1.pdf^{(690.1KB, pdf)}

Document S2. Article plus supporting material

mmc2.pdf^{(2MB, pdf)}

References

1.Taverna D.M., Goldstein R.A. Why are proteins marginally stable? Proteins. 2002;46:105–109. doi: 10.1002/prot.10016. [DOI] [PubMed] [Google Scholar]
2.Zeldovich K.B., Chen P., Shakhnovich E.I. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc. Natl. Acad. Sci. USA. 2007;104:16152–16157. doi: 10.1073/pnas.0705366104. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Razvi A., Scholtz J.M. Lessons in stability from thermophilic proteins. Protein Sci. 2006;15:1569–1578. doi: 10.1110/ps.062130306. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Koga N., Tatsumi-Koga R., et al. Baker D. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Itzhaki L.S., Otzen D.E., Fersht A.R. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding. J. Mol. Biol. 1995;254:260–288. doi: 10.1006/jmbi.1995.0616. [DOI] [PubMed] [Google Scholar]
6.Nisthal A., Wang C.Y., et al. Mayo S.L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. USA. 2019;116:16367–16377. doi: 10.1073/pnas.1903888116. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Geiger-Schuller K., Sforza K., et al. Barrick D. Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions. Proc. Natl. Acad. Sci. USA. 2018;115:7539–7544. doi: 10.1073/pnas.1800283115. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Mravic M., Thomaston J.L., et al. DeGrado W.F. Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science. 2019;363:1418–1423. doi: 10.1126/science.aav7541. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Huang P.-S., Oberdorfer G., et al. Baker D. High thermodynamic stability of parametrically designed helical bundles. Science. 2014;346:481–485. doi: 10.1126/science.1257481. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Giver L., Gershenson A., et al. Arnold F.H. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA. 1998;95:12809–12813. doi: 10.1073/pnas.95.22.12809. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Arnold F.H. The nature of chemical innovation: new enzymes by evolution. Q. Rev. Biophys. 2015;48:404–410. doi: 10.1017/S003358351500013X. [DOI] [PubMed] [Google Scholar]
12.Broom A., Trainor K., et al. Meiering E.M. Computational modeling of protein stability: quantitative analysis reveals solutions to pervasive problems. Structure. 2020;28:717–726.e3. doi: 10.1016/j.str.2020.04.003. [DOI] [PubMed] [Google Scholar]
13.Porebski B.T., Buckle A.M. Consensus protein design. Protein Eng. Des. Sel. 2016;29:245–251. doi: 10.1093/protein/gzw015. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Akanuma S., Nakajima Y., et al. Yamagishi A. Experimental evidence for the thermophilicity of ancestral life. Proc. Natl. Acad. Sci. USA. 2013;110:11067–11072. doi: 10.1073/pnas.1308215110. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Sternke M., Tripp K.W., Barrick D. In: Tawfik D.S., editor. Academic Press; 2020. The use of consensus sequence information to engineer stability and activity in proteins; pp. 149–179. (Methods in Enzymology). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Sternke M., Tripp K.W., Barrick D. Consensus sequence design as a general strategy to create hyperstable, biologically active proteins. Proc. Natl. Acad. Sci. USA. 2019;116:11275–11284. doi: 10.1073/pnas.1816707116. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Nikolova P.V., Henckel J., et al. Fersht A.R. Semirational design of active tumor suppressor p53 DNA binding domain with enhanced stability. Proc. Natl. Acad. Sci. USA. 1998;95:14675–14680. doi: 10.1073/pnas.95.25.14675. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Tripp K.W., Sternke M., et al. Barrick D. Creating a homeodomain with high stability and DNA binding affinity by sequence averaging. J. Am. Chem. Soc. 2017;139:5051–5060. doi: 10.1021/jacs.6b11323. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.El-Gebali S., Mistry J., et al. Finn R.D. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Nozaki Y. The preparation of guanidine hydrochloride. Methods Enzymol. 1972;26:43–50. doi: 10.1016/s0076-6879(72)26005-0. [DOI] [PubMed] [Google Scholar]
21.Street T.O., Courtemanche N., Barrick D. Methods in Cell Biology. Academic Press; 2008. Protein folding and stability using denaturants; pp. 295–325. [DOI] [PubMed] [Google Scholar]
22.Marold J.D., Sforza K., et al. Barrick D. A collection of programs for one-dimensional Ising analysis of linear repeat proteins with point substitutions. Protein Sci. 2021;30:168–186. doi: 10.1002/pro.3977. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Keller S., Vargas C., et al. Schuck P. High-precision isothermal titration calorimetry with automated peak-shape analysis. Anal. Chem. 2012;84:5066–5073. doi: 10.1021/ac3007522. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Zhao H., Piszczek G., Schuck P. SEDPHAT--a platform for global ITC analysis and global multi-method analysis of molecular interactions. Methods. 2015;76:137–148. doi: 10.1016/j.ymeth.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Robert F., Werner B. Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J. Comput. Chem. 1998;19:319–333. [Google Scholar]
26.Wells J.A. Additivity of mutational effects in proteins. Biochemistry. 1990;29:8509–8517. doi: 10.1021/bi00489a001. [DOI] [PubMed] [Google Scholar]
27.Fraenkel E., Rould M.A., et al. Pabo C.O. Engrailed homeodomain-DNA complex at 2.2 å resolution: a detailed view of the interface and comparison with other engrailed structures. J. Mol. Biol. 1998;284:351–361. doi: 10.1006/jmbi.1998.2147. [DOI] [PubMed] [Google Scholar]
28.Fenton A.W., Page B.M., et al. Swint-Kruse L. Rheostat positions: a new classification of protein positions relevant to pharmacogenomics. Med. Chem. Res. 2020;29:1133–1146. doi: 10.1007/s00044-020-02582-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Meinhardt S., Manley M.W.J., Jr., et al. Swint-Kruse L. Rheostats and toggle switches for modulating protein function. PLoS One. 2013;8:e83502. doi: 10.1371/journal.pone.0083502. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Gribenko A.V., Patel M.M., et al. Makhatadze G.I. Rational stabilization of enzymes by computational redesign of surface charge-charge interactions. Proc. Natl. Acad. Sci. USA. 2009;106:2601–2606. doi: 10.1073/pnas.0808220106. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Wang Q., Buckle A.M., et al. Fersht A.R. Design of highly stable functional GroEL minichaperones. Protein Sci. 1999;8:2186–2193. doi: 10.1110/ps.8.10.2186. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sullivan B.J., Nguyen T., et al. Magliery T.J. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability. J. Mol. Biol. 2012;420:384–399. doi: 10.1016/j.jmb.2012.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Di Nardo A.A., Larson S.M., Davidson A.R. The relationship between conservation, thermodynamic stability, and function in the SH3 domain hydrophobic core. J. Mol. Biol. 2003;333:641–655. doi: 10.1016/j.jmb.2003.08.035. [DOI] [PubMed] [Google Scholar]
34.Socolich M., Lockless S.W., et al. Ranganathan R. Evolutionary information for specifying a protein fold. Nature. 2005;437:512–518. doi: 10.1038/nature03991. [DOI] [PubMed] [Google Scholar]
35.Myers J.K., Pace C.N., Scholtz J.M. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci. 1995;4:2138–2148. doi: 10.1002/pro.5560041020. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Auton M., Bolen D.W. Predicting the energetics of osmolyte-induced protein folding/unfolding. Proc. Natl. Acad. Sci. 2005;102:15065–15068. doi: 10.1073/pnas.0507053102. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supporting materials and methods, Figs. S1–S4, and Tables S1–S5

mmc1.pdf^{(690.1KB, pdf)}

Document S2. Article plus supporting material

mmc2.pdf^{(2MB, pdf)}

[bib1] 1.Taverna D.M., Goldstein R.A. Why are proteins marginally stable? Proteins. 2002;46:105–109. doi: 10.1002/prot.10016. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Zeldovich K.B., Chen P., Shakhnovich E.I. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc. Natl. Acad. Sci. USA. 2007;104:16152–16157. doi: 10.1073/pnas.0705366104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Razvi A., Scholtz J.M. Lessons in stability from thermophilic proteins. Protein Sci. 2006;15:1569–1578. doi: 10.1110/ps.062130306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Koga N., Tatsumi-Koga R., et al. Baker D. Principles for designing ideal protein structures. Nature. 2012;491:222–227. doi: 10.1038/nature11600. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Itzhaki L.S., Otzen D.E., Fersht A.R. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding. J. Mol. Biol. 1995;254:260–288. doi: 10.1006/jmbi.1995.0616. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Nisthal A., Wang C.Y., et al. Mayo S.L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. USA. 2019;116:16367–16377. doi: 10.1073/pnas.1903888116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Geiger-Schuller K., Sforza K., et al. Barrick D. Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions. Proc. Natl. Acad. Sci. USA. 2018;115:7539–7544. doi: 10.1073/pnas.1800283115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Mravic M., Thomaston J.L., et al. DeGrado W.F. Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science. 2019;363:1418–1423. doi: 10.1126/science.aav7541. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Huang P.-S., Oberdorfer G., et al. Baker D. High thermodynamic stability of parametrically designed helical bundles. Science. 2014;346:481–485. doi: 10.1126/science.1257481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Giver L., Gershenson A., et al. Arnold F.H. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA. 1998;95:12809–12813. doi: 10.1073/pnas.95.22.12809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Arnold F.H. The nature of chemical innovation: new enzymes by evolution. Q. Rev. Biophys. 2015;48:404–410. doi: 10.1017/S003358351500013X. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Broom A., Trainor K., et al. Meiering E.M. Computational modeling of protein stability: quantitative analysis reveals solutions to pervasive problems. Structure. 2020;28:717–726.e3. doi: 10.1016/j.str.2020.04.003. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Porebski B.T., Buckle A.M. Consensus protein design. Protein Eng. Des. Sel. 2016;29:245–251. doi: 10.1093/protein/gzw015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Akanuma S., Nakajima Y., et al. Yamagishi A. Experimental evidence for the thermophilicity of ancestral life. Proc. Natl. Acad. Sci. USA. 2013;110:11067–11072. doi: 10.1073/pnas.1308215110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Sternke M., Tripp K.W., Barrick D. In: Tawfik D.S., editor. Academic Press; 2020. The use of consensus sequence information to engineer stability and activity in proteins; pp. 149–179. (Methods in Enzymology). [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Sternke M., Tripp K.W., Barrick D. Consensus sequence design as a general strategy to create hyperstable, biologically active proteins. Proc. Natl. Acad. Sci. USA. 2019;116:11275–11284. doi: 10.1073/pnas.1816707116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Nikolova P.V., Henckel J., et al. Fersht A.R. Semirational design of active tumor suppressor p53 DNA binding domain with enhanced stability. Proc. Natl. Acad. Sci. USA. 1998;95:14675–14680. doi: 10.1073/pnas.95.25.14675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Tripp K.W., Sternke M., et al. Barrick D. Creating a homeodomain with high stability and DNA binding affinity by sequence averaging. J. Am. Chem. Soc. 2017;139:5051–5060. doi: 10.1021/jacs.6b11323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.El-Gebali S., Mistry J., et al. Finn R.D. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47:D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Nozaki Y. The preparation of guanidine hydrochloride. Methods Enzymol. 1972;26:43–50. doi: 10.1016/s0076-6879(72)26005-0. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Street T.O., Courtemanche N., Barrick D. Methods in Cell Biology. Academic Press; 2008. Protein folding and stability using denaturants; pp. 295–325. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Marold J.D., Sforza K., et al. Barrick D. A collection of programs for one-dimensional Ising analysis of linear repeat proteins with point substitutions. Protein Sci. 2021;30:168–186. doi: 10.1002/pro.3977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Keller S., Vargas C., et al. Schuck P. High-precision isothermal titration calorimetry with automated peak-shape analysis. Anal. Chem. 2012;84:5066–5073. doi: 10.1021/ac3007522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Zhao H., Piszczek G., Schuck P. SEDPHAT--a platform for global ITC analysis and global multi-method analysis of molecular interactions. Methods. 2015;76:137–148. doi: 10.1016/j.ymeth.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Robert F., Werner B. Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J. Comput. Chem. 1998;19:319–333. [Google Scholar]

[bib26] 26.Wells J.A. Additivity of mutational effects in proteins. Biochemistry. 1990;29:8509–8517. doi: 10.1021/bi00489a001. [DOI] [PubMed] [Google Scholar]

[bib27] 27.Fraenkel E., Rould M.A., et al. Pabo C.O. Engrailed homeodomain-DNA complex at 2.2 å resolution: a detailed view of the interface and comparison with other engrailed structures. J. Mol. Biol. 1998;284:351–361. doi: 10.1006/jmbi.1998.2147. [DOI] [PubMed] [Google Scholar]

[bib28] 28.Fenton A.W., Page B.M., et al. Swint-Kruse L. Rheostat positions: a new classification of protein positions relevant to pharmacogenomics. Med. Chem. Res. 2020;29:1133–1146. doi: 10.1007/s00044-020-02582-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 29.Meinhardt S., Manley M.W.J., Jr., et al. Swint-Kruse L. Rheostats and toggle switches for modulating protein function. PLoS One. 2013;8:e83502. doi: 10.1371/journal.pone.0083502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Gribenko A.V., Patel M.M., et al. Makhatadze G.I. Rational stabilization of enzymes by computational redesign of surface charge-charge interactions. Proc. Natl. Acad. Sci. USA. 2009;106:2601–2606. doi: 10.1073/pnas.0808220106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Wang Q., Buckle A.M., et al. Fersht A.R. Design of highly stable functional GroEL minichaperones. Protein Sci. 1999;8:2186–2193. doi: 10.1110/ps.8.10.2186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Sullivan B.J., Nguyen T., et al. Magliery T.J. Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability. J. Mol. Biol. 2012;420:384–399. doi: 10.1016/j.jmb.2012.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Di Nardo A.A., Larson S.M., Davidson A.R. The relationship between conservation, thermodynamic stability, and function in the SH3 domain hydrophobic core. J. Mol. Biol. 2003;333:641–655. doi: 10.1016/j.jmb.2003.08.035. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Socolich M., Lockless S.W., et al. Ranganathan R. Evolutionary information for specifying a protein fold. Nature. 2005;437:512–518. doi: 10.1038/nature03991. [DOI] [PubMed] [Google Scholar]

[bib35] 35.Myers J.K., Pace C.N., Scholtz J.M. Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci. 1995;4:2138–2148. doi: 10.1002/pro.5560041020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Auton M., Bolen D.W. Predicting the energetics of osmolyte-induced protein folding/unfolding. Proc. Natl. Acad. Sci. 2005;102:15065–15068. doi: 10.1073/pnas.0507053102. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Surface residues and nonadditive interactions stabilize a consensus homeodomain protein

Matt Sternke

Katherine W Tripp

Doug Barrick

Abstract

Significance

Introduction

Materials and methods

Cloning, protein expression, and purification

Equilibrium guanidine hydrochloride denaturations

DNA binding measurements

Residue solvent-accessible surface area calculations

Analysis of feature overlaps

Results

Substitutions by residue charge state

Figure 1.

Substitutions by side-chain burial

Figure 2.

Substitutions by positional conservation

Figure 3.

Nonadditive effects on stability of consensus substitutions

Figure 4.

DNA binding affinities of consensus variants

Figure 5.

Figure 6.

Discussion

Overlap between groups of substitutions

Comparison of effects of different sequence and structural features on folding stability

Comparison of changes to folding stability and DNA binding

Contribution of nonadditivity to consensus stabilization

Author contributions

Acknowledgments

Footnotes

Supporting citations

Supporting material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases