Abstract
The effect of a mutation on protein stability is traditionally measured by genetic construction, expression, purification, and physical analysis using low‐throughput methods. This process is tedious and limits the number of mutants able to be examined in a single study. In contrast, functional fitness effects can be measured in a high‐throughput manner by various deep mutational scanning tools. Using protein GB 1, we have recently demonstrated the feasibility of estimating the mutational stability effect ( G) of single‐substitution based on the functional fitness profile of all double‐substitutions. The principle is to identify genetic backgrounds that have an exhausted stability margin. The functional effect of an additional substitution on these genetic backgrounds can then be used to compute the mutational G based on the biophysical relationship between functional fitness and thermodynamic stability. However, to identify such genetic backgrounds, the approach described in our previous study required a benchmark dataset, which is a set of known mutational G. In this study, a benchmark‐independent approach is developed. The genetic backgrounds of interest are identified using k‐means clustering with the integration of structural information. We further demonstrated that a reasonable approximation of G can also be obtained without taking structural information into account. In summary, this study describes a novel method for computing G from double‐substitution functional fitness profiles alone, without relying on any known mutational G as a benchmark.
Keywords: protein stability, mutant stability prediction, mutagenesis, fitness profiling
Introduction
Traditionally, mutational stability is measured using spectroscopy‐based methods, or thermal denaturation and calorimetry. Although these techniques allow precise determination of protein stability, the experimental process is tedious. These low‐throughput methods limit the number of mutations able to be examined in a single study. In light of the importance and challenge of measuring mutational stability, several high‐throughput strategies have been explored (reviewed in [1]). They have been successfully applied to aid enzyme design,2 and to increase protein stability.3 Nonetheless, these existing approaches lack the power to accurately quantify mutational stability. As a result, there is a demand for an alternative high‐throughput approach to quantify mutational stability.
In contrast to protein stability, protein function can often be simply and directly assayed. In addition, the advancement of next‐generation sequencing (NGS) permits quantification of functional fitness for a large number of mutants in parallel using deep mutational scanning techniques (reviewed in [4]). However, according to the threshold robustness model, the relationship between functional fitness and stability is not straightforward.5 Many destabilizing mutations will not appreciably reduce the amount of native protein at room temperature. This is because the deleterious effect of destabilizing mutations can be buffered by the excess stability margin and may not be reflected in the functional fitness of the protein mutant.6, 7, 8, 9 Consequently, although functional fitness data can be readily obtained using deep mutational scanning, computing mutational stability effects from such data is a nontrivial process.
Recently, we have measured the affinity of all double‐substitutions on protein G using mRNA display.10 This dataset can be viewed as a set of functional fitness profiles of single‐substitutions in different genetic backgrounds that are one substitution away from wild type (WT). It opens up the opportunity to identify genetic backgrounds with an exhausted stability margin, which permit the quantification of G for individual substitutions across protein G. We have previously demonstrated the feasibility of using the fitness dataset of all double‐substitutions to compute G for individual single‐substitutions across the entire protein domain.10 However, the approach described previously requires a benchmark dataset, which is a set of mutations with known G. This prerequisite limited the application of our previously described G approximation approach to well‐characterized protein, in which mutational G is readily available in the literature. In this study, we were able to overcome this limitation by applying k‐means clustering approach to analyze the functional fitness profiling data. We demonstrated the feasibility of quantifying mutational stability effect from functional fitness profiling data alone, without any prior knowledge of the mutational G of the protein.
Results
Conceptual basis of transforming mutational fitness to G
Several studies have shown that naturally occurring proteins can tolerate one to a few destabilizing mutations without significantly affecting their folding and function.6, 7, 8, 9 It is best described by the threshold robustness model,5 in which the deleterious effect of destabilizing mutations can be buffered by the excess stability margin and would not result in a reduction of native protein [Fig. 1(A, B)]. However, if this stability margin is exhausted, addition of a destabilizing mutation will partially unfold the protein and lead to a decrease in observed functional fitness. The same principle can be applied to stabilizing mutations. When protein stability is above the threshold, a further increase in stability would not detectably increase the amount of native protein and would not improve functional fitness. Whereas if this stability margin is exhausted such that the protein is partially unfolded, addition of a stabilizing mutation will lead to an improvement in fraction folded and an increase in functional fitness. Consequently, under threshold robustness model, an additive effect in stability can lead to a nonadditive effect in functional fitness, hence epistasis.
We have recently described a deep mutational scanning dataset, in which the functional fitness of all single‐substitutions and double‐substitutions across protein G was measured in a high‐throughput manner.10 This dataset can be partitioned into different single‐substitution functional fitness profiles in all genetic backgrounds that are one substitution away from wild type (WT) [Fig. 1(C, D)]. Each functional fitness profile can then be converted to a G profile based on the biophysical relationship between fitness, protein folding, and stability (see Materials and Methods section). Nonetheless, accurate measurement of G from functional fitness requires a genetic background which has an exhausted stability margin, and therefore, be partially unfolded. If the stability margin of the genetic background is not exhausted (i.e., the protein is fully folded), the bona fide G value cannot be inferred from the corresponding functional fitness profile. Here, we refer to single‐substitutions that distinguish individual genetic backgrounds from WT as S BG.
Previously, we have shown the feasibility of identifying S BG of interest (genetic backgrounds with an exhausted stability margin) using the double‐substitution fitness dataset and a set of known mutational G.10 Briefly, we did an iterative comparison between the G profiles for individual S BG and a set of G for 82 single‐substitutions found in the literature. The correlation between the G profile and the benchmark is termed as R Literature. A range of R Literature (range: −0.40 to 0.90, mean: 0.21, standard deviation: 0.25) is observed for different S BG (Supporting Information, Fig. S1). In our previous study, those S BG that gave a high R Literature would be identified as the S BG of interest. Subsequently, the G of all the other substitutions across the protein can be extrapolated based on the S BG of interest. If the benchmark set was not available, we would not be able to identify the S BG of interest, hence compute the G for individual single‐substitutions. Unfortunately, such a benchmark dataset is not usually available for most naturally occurring proteins. Here, we aimed to develop a general approach, that is independent of any benchmark dataset, for identifying those S BG of interest to accurately compute G for individual single‐substitutions from functional fitness data alone.
Properties of genetic backgrounds of interest
Consequently, we investigated whether there was any intrinsic property for the S BG to achieve a high R Literature. This set of parameters, if existing, would help identify those genetic backgrounds that allow accurate computing of G when a set of benchmark G is not available.
Here, we examined two parameters, namely relative solvent accessibility (RSA) and functional fitness (W). We hypothesized that those S BG with a high R Literature would reside in the protein core, and thus, have a low RSA. The reasoning is that substitutions at the protein core are more likely to unfold the protein.11 In addition, we also proposed that S BG that partially unfolded the protein would acquire an intermediate fitness between the WT protein (W = 1) and completely nonfunctional variants (W = 0.01 is the limit of detection). Indeed, our data are consistent with these notions [Fig. 2(A) and Supporting Information, Fig. S2(A,B)]. Most S BG with a R Literature of >0.75 have an intermediate fitness and reside in the core of the protein [Fig. 2(B, C) and Supporting Information, Table S1]. However, although many S BG with a high R Literature displayed an intermediate fitness and low RSA, a significant number of S BG with a low R Literature also satisfied these selection criteria [Fig. 2(A) and Fig. Supporting Information, S2(C)]. This suggests that additional parameters need to be considered to identify those S BG of interest.
Grouping genetic backgrounds based on intercorrelation of G profiles
Instead of testing more parameters, we turned to explore a more general approach to identify those S BG of interest based on clustering of the G profiles. The underlying rationale is that those S BG with high R Literature should have a similar G profile, thus being grouped together. Since beneficial S BG was unlikely to exhaust the stability margin and would give a complex number in the transformation of the fitness profile to the G profile [see Eq. (11) in Materials and Methods section], only 678 S BG, which had a W of 0–1, were included in the subsequent analyses unless otherwise stated.
To test this idea, we first built a correlation matrix based on G profiles. This correlation matrix represented all the pairwise correlation among the G profiles of individual S BG. Subsequently, a hierarchical clustering was performed to group S BG based on the correlation matrix. Interesting, we observed three major groups of S BG, in which the G profiles of the S BG within the group were intercorrelating [Fig. 3(A)]. According to our rationale, S BG of interest would fall into the same group. Therefore, the distribution of R Literature for individual S BG within each group was examined [Fig. 3(B)]. The R Literature of S BG in group II was significantly higher than that of group I and group III (P = 4 , Wilcoxon rank‐sum test). In fact, 8 of the top 10 S BG with the highest R Literature were within group II. This suggests that S BG with high R Literature would group together.
We aimed to further understand the biophysical basis of the clustering result for group I (Supporting Information, Table S2), group II (Supporting Information, Table S3) and group III (Supporting Information, Table S4) by examining the fitness and RSA of the S BG within each group [Fig. S3(A, B)]. Group I was enriched in highly deleterious, and presumably highly destabilizing S BG. Under these genetic backgrounds, addition of a stabilizing mutation would not improve functional fitness due to the extremely destabilizing effect brought by S BG. Likewise, addition of a destabilizing mutation would also exert no effect on functional fitness, simply because the protein cannot be further unfolded. Thus, the S BG in group I produced correlated G profiles simply due to lethality common to all S BG in group I. S BG in group III possessed a range of functional fitness and RSA. When we looked at the list of S BG in group III, we found that 92% of S BG were located within a dynamic region which displays positive epistasis in general.10 This indicates the intercorrelation among S BG in group III was predominantly driven by a functional epistatic effect. S BG in group II acquired the best R Literature among these three groups. Many of the S BG in group II were at core residues which are predicted to partially unfolded the protein. Consistent with this notion, those S BG in group II had an attenuated functional fitness and a low RSA.
Overall, these analyses demonstrated the possibility of grouping S BG with high R Literature together. Nonetheless, decreasing the size of each group (i.e., increasing the number of groups) was needed to improve the true positive rate in the S BG group of interest. Since there were only a small number of S BG of interest, a smaller cluster size would group the S BG of interest and exclude SBG that possessed a lower R Literature.
Identification of genetic backgrounds of interest by k‐means clustering and subsequent group sorting
To further refine the grouping of S BG, we performed a k‐means clustering based on the correlation matrix. By the definition of k‐means clustering, each run of clustering would generate k S BG groups. This method would give a fine control of the number of S BG groups being generated in each clustering run. The remaining challenge was to distinguish the S BG group of interest (i.e., the S BG group with an enrichment in S BG with a high R Literature) from other S BG groups.
From our analysis above, we observed that those S BG with a high R Literature possessed a low RSA. Therefore, we aimed to design a RSA‐based sorting strategy to identify the S BG group of interest. First, k‐means clustering were performed 100 times to generate 100 × k S BG groups (see Materials and Methods section). The mean RSA of all S BG within each S BG group was then computed. Subsequently, a mean RSA was obtained for each of the 100 × k S BG groups. In this approach, the S BG group with the lowest mean RSA would be identified as the S BG group of interest. The G for each substitution was then calculated using the median G among the genetic backgrounds within the group of interest.
We tested this approach with a range of k using those 678 S BG which carried a functional fitness within 0 to 1. When the k was sufficiently large (k > 10), the S BG group of interest would carry an R Literature of ∼0.8 [Fig. 4(A)]. We also tested if the result would improve by only considering a subset of S BG within a narrower functional fitness range. The underlying rationale is from the analysis above, in which S BG of interest carried an intermediate functional fitness. Indeed, the result could be slightly improved by filtering out those S BG with a low functional fitness (W < 0.4). Furthermore, an apparent correlation was observed between the mean RSA and R Literature for S BG groups generated from k‐means clustering [Fig. 4(B)]. In contrast, G predicted from Rosetta software only achieved an R Literature of 0.52.13 It highlights the strength of estimating G from functional fitness data and reflects the success of our approach. Those S BG within best hit (R Literature = 0.91) are indicated in Supporting Information, Table S1. The G for all single‐substitutions computed from the best hit are shown in Supporting Information, Table S5.
Since RSA can only be obtained when the structural information is available, another sorting parameter would be needed. Consequently, we tested whether the S BG group of interest could be identified without using RSA as the sorting parameter. Instead, we used hydrophobicity of the WT residue of each S BG as the sorting parameter. The rationale is that core residues (low RSA) are in general more hydrophobic. We performed k‐means clustering for 100 times to generate 100 × k S BG groups. The mean hydrophobicity for each S BG group was used as the sorting parameter. Here, the Hopp–Woods hydrophilicity scale was used,15 in which the lower the value, the more hydrophobic the residue is. The S BG group with the lowest mean Hopp–Woods hydrophilicity value would be identified as the S BG group of interest. The S BG group of interest carry an R Literature of ∼0.6 to ∼0.8 for a wide range of k [Fig. 4(C)]. Nonetheless, the result would get worse as k increased above 20, especially when those S BG with a low functional fitness (W < 0.4) were included in the run. Although the result was not as robust as that of using RSA as the sorting parameter, it showed the possibility of computing G from functional fitness data without any prior knowledge in the protein structure.
Discussion
Developing a high‐throughput approach for measuring mutant stability effect is a popular research area in both experimental and computational biology. It is because the stability effect upon mutation is the foundation of various biological processes, ranging from natural evolution to protein engineering. Experimental approaches usually infer the mutant stability effect from an indirect functional readout, which possess a low quantitative potential for computing G. Computational approaches estimate the mutant stability effect based on a predefined set of force field parameters and energy functions. It has been demonstrated that in silico mutant stability prediction can achieve a correlation of ∼0.6 to ∼0.7 with experimental data.14, 16 This imperfect correlation indicates more parameters have to be considered during the prediction. In this study, we demonstrated an alternative approach of computing mutational stability effect based on functional profiling of double substitutions. By integrating the structural information as RSA for each residues, we were able to reliably compute the G for individual substitutions across the entire protein domain.
The principle of our approach is that mutational G and functional fitness follow a defined biophysical relationship when the genetic background has an exhausted stability margin, hence is partially unfolded. Those genetic backgrounds, despite being useful, are presumably rare and difficult to be identified. When k‐means clustering is applied to group the S BG based on their intercorrelations of G profile, those S BG of interest would be grouped together. Among different S BG groups produced by the clustering, the S BG group of interest can be identified by the low mean RSA. One caveat of using mean RSA as the sorting parameter is that other S BG groups may possess an equally low mean RSA as that of the S BG group of interest. It would happen if those low RSA S BG that have a low R Literature intercorrelated with each other. Of note, S BG that have a low R Literature do exist, as evidenced by Figure 3(A). But apparently, these S BG do not have a tight intercorrelation as evidenced by Figure 4(B), in which S BG group with low R Literature and low mean RSA do not exist. This result suggests that correlation of G profile between different S BG is independent of RSA similarity, which is indeed observed in our study (Supporting Information, Fig. S4).
We acknowledge that our approach may not be the optimum method for computing mutational G from functional fitness profiling data. While mutations at residues with low RSA are more likely to exhaust the stability margin of the protein,11 certain mutations at residues with high RSA may also achieve the same effect. It is possible that in other proteins, mutations at residues with high RSA provide a better genetic background for computing mutational G from functional fitness profiling data. Therefore, alternative approaches should be investigated in the future. For instance, usage of maximum‐entropy probability models, which are efficient at discerning real correlation in biological data with a high signal‐to‐noise ratio,17 may improve the performance in identifying the S BG group of interest.
Due to the continued development of high‐throughput sequencing technologies, functional fitness profiling has recently become a popular tool. Most studies focuses on the functional effect of single‐substitution, which is not instrumental for computing the mutational stability effect. Our recent study has demonstrated the possibility of studying the fitness effect of all double‐substitution on a protein domain.10 Here, we further showed that such datasets alone can be used to identify the G for all single‐substitution without any prior knowledge on mutational G of the given protein. In summary, the approach described in this study represents a combination of an advanced experimental techniques and a novel analytical method for G profiling. This approach can be applied to other proteins of interest to facilitate protein engineering and study natural evolution from a thermodynamic perspective.
Materials and Methods
Fitness profiling data
Experimental profiling data, which contained the fitness value relative to WT of all single substitutions and all double substitutions across protein G binding domain 1, was obtained from our previous study.10 We only employed the fitness value from mutants (single or double substitutions) with an input count of >10 to minimize the statistical noise in computing the fitness value. All 19 × 55 = 1045 single substitutions and 529,902 (98.8%) out of = 536,085 all possible double substitutions satisfied this cutoff.
RSA measurement
Solvent accessibility surface area (SASA) was measured by DSSP18 using the protein G monomer structure in PDB 1PGA.12 SASA was then normalized with the SASA calculated from an unfolded Gly‐X‐Gly tripeptide structure (X = any amino acid) using PyMOL to obtain the RSA.
Transforming fitness profile to G profile
The fitness profile of double substitutions in the WT genetic background was partitioned into 1485 single substitution fitness profiles in a genetic background that was one substitution away from WT. Each fitness profile was then converted to a G profile based on the assumptions described in our recent publication.10 In this study, we referred G as the G of unfolding. Thus, the higher the G, the more stability the substitution confers. Here the relative association constant is indicated by ΔKA, which represents the ratio of association constant (K A) for a given mutant to K A of the WT. In addition, subscript N represents native state and subscript U represents unfolding. Here are the assumptions:
- Assuming mutational effect on affinity is additive, such that the relative association constant (ΔK A) of the double substitution (AB) is a product of that of the corresponding single substitutions (A and B). Therefore,
where represents the relative association constant for double substitution AB, represents the relative association constant for single substitution A, and represents the relative association constant for single substitution B.(1) Assuming the fitness (W) of a given substitution i is a product of the fraction of the protein in the native state (f N) and the fitness of the native state (W N). Therefore, W i = f × W Here, we set the fitness for WT (W WT) to 1, the fraction of the protein in the native state for WT (f ) to 1 (the amount of unfolded protein for WT is negligible), and the fitness of the native state for WT (W ) to 1.
Assuming the native state of a given substitution i has the same fitness with that of the WT. Therefore, W = W = 1. Hence, according to assumption 2, W i = f .Since fitness of the native state (W N) can be related to the relative association constant (ΔK A) by a one‐to‐one function as previously described,10 genotypes with the same W N carry the same ΔK A. Therefore, the binding constant of a given substitution i is the same as that of the WT (K = K ) if the native state of a given substitution i has the same fitness with that of the WT. Hence, Δ = 1.
Assuming the native state of a given substitution i has the same fraction of protein in the native state as that of the WT. Therefore, , and hence according to assumption 2, W i = W .
We anticipated that most substitutions would satisfy assumption 4 but not assumption 3. This is reasoned by the fact that most substitutions would not unfold the protein at room temperature. In contrast, only a few substitutions would partially or fully unfold the protein.
In the following, we considered that substitution A satisfied assumption 3 and substitution B satisfied assumption 4, fraction of the protein in the native state for double substitution AB ( ) would be equal to the ratio of the fitness of double substitution AB to the fitness of substitution B (W AB/W B), which is derived as follows:
According to assumption 2, W AB = × W , and, therefore:
(2) |
Since substitution A satisfied assumption 3,
(3) |
Since substitution B satisfied assumption 4,
(4) |
Substituting Eq. (3) into Eq. (1) states that the relative association constant of double substitution AB equals the relative association constant of substitution B (Δ = Δ ), hence fitness of the native state of double substitution AB ( ) equals fitness of the native state of substitution B ( ):
(5) |
Substituting Eq. (4) into Eq. (5) states that the fitness of the native state of double substitution AB equals fitness of substitution B:
(6) |
Substituting Eq. (6) into Eq. (2) states that the ratio of the fitness to the fraction of protein in the native state of substitution AB equals the fitness of substitution B (W AB/ = W B), and hence:
(7) |
Assuming stability effect is additive, the change in free energy of unfolding for substitution B can be obtained by subtracting the change in free energy of unfolding for substitution A from the change in free energy of unfolding for double substitution AB ( = Δ − Δ ). The relationship between an equilibrium constant (K) and change of free energy (ΔG) states that ΔG = , where R represents the ideal gas constant and T represents the temperature. Therefore,
(8) |
where represents the equilibrium constant of unfolding of substitution A and represents the equilibrium constant of unfolding of double substitution AB. Since equilibrium constant of unfolding (K U) is related to fraction of protein in the native state (f N) by K U = (1 − f N)/f N, Eq. (8) can be rearranged to:
(9) |
Substituting Eq. (7) into Eq. (9):
(10) |
Since substitution A satisfied assumption 3, W A = :
(11) |
The challenge was to identify those substitutions that satisfy assumption 3 (i.e., substitution A, see above). In this study, this set of substitutions was called (S BG), which acted as a substitution in the genetic background for accurately estimate of the real G of each single substitution across the protein domain (i.e., substitution B, see above). These substitutions would be those S BG with a high R Literature. In contrast, when a substitution that did not satisfy assumption 3 was used as a genetic background for computing the G of other substitutions, the G would be deviate from the real G value. These substitutions would be those S BG with a low R Literature.
Clustering of background substitutions (S BG)
A correlation matrix for all genetic background was generated based on the pairwise Pearson's correlation of their single‐substitution G profiles. k‐means clustering was performed on the correlation matrix to group the S BG. k‐means clustering was performed 100 times using random seeding. As a result, 100 × k S BG groups were obtained. Groups with < 3 S BG were filtered out. The G of each substitution was then calculated by taking the median value of its G computed from individual genetic backgrounds with a given S BG group. Of note, we acknowledge that nonadditive stability would prevent accurate estimation of G [Eq. 8]. Therefore, we took the median value instead of mean to prevent G estimation being confounded by sporadic nonadditive stability effect. For example, if the S BG group of interest consists three different genetic backgrounds (e.g., A, B, and C). If we wanted to compute the G for mutation Z, which has an estimated G = 0 under genetic background A, an estimated G = 1 under genetic background B, and an estimated G = −1 under genetic background C, the median value (in this case, G = 0) would be taken as the final computed G for mutation Z in our approach.
Supporting information
References
- 1. Magliery TJ, Lavinder JJ, Sullivan BJ (2011) Protein stability by number: high‐throughput and statistical approaches to one of protein science's most difficult problems. Curr Opin Chem Biol 15:443–451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Giver L, Gershenson A, Freskgard PO, Arnold FH (1998) Directed evolution of a thermostable esterase. Proc Natl Acad Sci U S A 95:12809–12813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Foit L, Morgan GJ, Kern MJ, Steimer LR, von Hacht AA, Titchmarsh J, Warriner SL, Radford SE, Bardwell JC. (2009) Optimizing protein stability in vivo. Mol Cell 36:861–871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Fowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11:801–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH. (2005) Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci USA 102:606–611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Wilke CO, Bloom JD, Drummond DA, Raval A (2005) Predicting the tolerance of proteins to random amino acid substitution. Biophys J 89:3714–3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci USA 103:5869–5874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Bershtein S, Segal M, Bekerman R, Tokuriki N, Tawfik DS (2006) Robustness‐epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444:929–932. [DOI] [PubMed] [Google Scholar]
- 9. Gong LI, Suchard MA, Bloom JD (2013) Stability‐mediated epistasis constrains the evolution of an influenza protein. Elife 2:e00631 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Olson CA, Wu NC, Sun R (2014) A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24:2643–2651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS (2007) The stability effects of protein mutations appear to be universally distributed. J Mol Biol 369:1318–1332. [DOI] [PubMed] [Google Scholar]
- 12. Gallagher T, Alexander P, Bryan P, Gilliland GL (1994) Two crystal structures of the b1 immunoglobulin‐binding domain of streptococcal protein g and comparison with NMR. Biochemistry 33:4721–4729. [PubMed] [Google Scholar]
- 13. Das R, Baker D (2008) Macromolecular modeling with rosetta. Annu Rev Biochem 77:363–382. [DOI] [PubMed] [Google Scholar]
- 14. Kellogg EH, Leaver‐Fay A, Baker D (2011) Role of conformational sampling in computing mutation‐induced changes in protein structure and stability. Proteins 79:830–838. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 78:3824–3828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Potapov V, Cohen M, Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 22:553–560. [DOI] [PubMed] [Google Scholar]
- 17. Stein RR, Marks DS, Sander C (2015) Inferring pairwise interactions from biological data using maximum‐entropy probability models. PLoS Comput Biol 11:e1004182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers 22:2577–2637. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.