Abstract
Protein destabilization by amino acid substitutions is proposed to play a prominent role in widespread inherited human disorders, not just those known to involve protein misfolding and aggregation. To test this hypothesis, we computationally evaluate the effects on protein stability of all possible amino acid substitutions in 20 disease-associated proteins with multiple identified pathogenic missense mutations. For 18 of the 20 proteins studied, substitutions at known positions of pathogenic mutations are significantly more likely to destabilize the native protein fold (as indicated by more positive values of ΔΔG). Thus, positions identified as sites of disease-associated mutations, as opposed to non-disease-associated sites, are predicted to be more vulnerable to protein destabilization upon amino acid substitution. This finding supports the notion that destabilization of native protein structure underlies the pathogenicity of broad set of missense mutations, even in cases where reduced protein stability and/or aggregation are not characteristic of the disease state.
Keywords: Destabilization, Inherited disorder, Pathogenic mutation, Stability, Aggregation
Introduction
The need for proteins to adopt a stable folded structure constrains protein evolvability, as the destabilizing effects of many nascent amino acid substitutions negate any functional improvements they might confer (Drummond and Wilke 2008; Zeldovich et al. 2007; Bloom et al. 2006). Amino acid substitutions encoded by disease-associated single nucleotide polymorphisms (SNPs) may owe their pathogenicity to disruption of one or more crucial features of the native protein, such as overall folding stability, ligand binding, allosteric coupling, catalytic activity, and post-translational maturation (Yue et al. 2005; Wang and Moult 2001; Xu and Zhang 2014). Several computational tools, including FoldX (Guerois et al. 2002), PolyPhen (Ramensky et al. 2002), PANTHER (Thomas et al. 2003), SIFT (Ng and Henikoff 2003), nsSNPAnalyzer (Bao et al. 2005), PhD-SNP (Capriotti et al. 2006), Eris (Yin et al. 2007a, b), SNAP (Bromberg and Rost 2007), MutPred (Li et al. 2009), SNPs&GO (Calabrese et al. 2009), and PolyPhen2 (Adzhubei et al. 2010) have been developed to (i) determine the impact of pathogenic and benign mutations on the structure and function of a human protein, and (ii) identify mutations that can eliminate the deleterious effects introduced by pathogenic ones. However, it is still unclear whether disease-associated missense mutations are significantly enriched in regions of protein sequence with highest vulnerability to destabilization. To evaluate this possibility and the hypothesis that protein destabilization plays a role in the pathologies of diverse monogenic disorders, we perform an exhaustive survey of amino acid substitutions in 20 proteins with missense mutations linked to various human diseases. For each substitution, we calculate the resulting change in folding free energy (ΔΔG) using the Eris software (Yin et al. 2007a, b; Ding and Dokholyan 2006), utilizing available structures in the PDB as starting structures (Methods section). Substitutions involving cysteine residues (substitutions either to or from cysteine) are excluded, since these ΔΔG calculations cannot account for the contribution of disulfide bonds to protein stability. Proteins surveyed in this study range in number of subunits from 1 (monomeric) to 10 (decameric) and are involved in human diseases involving disparate tissues and pathogenic mechanisms (Table 1).
Table 1.
Disease-linked proteins analyzed in this study
| Gene | Protein product | Prominent-associated disorder(s) |
PDBID | # of subunits in biological assembly |
Disease-associated/ neutral AA positionsa |
p value (neutral vs. pathological ΔΔG)b |
|---|---|---|---|---|---|---|
| SOD1 | Cu/Zn superoxide dismutase | Amyotrophic lateral sclerosis | 1SPD | 2 | 63/86 | <2.20 × 10−16 |
| TTR | Transthyretin | Amyloidosis | 1TTA | 4 | 53/73 | <2.20 × 10−16 |
| HBA1 | Hemoglobin, alpha subunit | Alpha-thalassemia | 2HHB | 4 (2α + 2β) | 24/117 | 1.11 × 10−11 |
| HBB | Hemoglobin, beta subunit | Beta-thalassemia | 2HHB | 4 (2α + 2β) | 32/114 | 1.10 × 10−10 |
| HPRT1 | Hypoxanthine phosphoribosyltransferase 1 | Lesch–Nyhan syndrome | 1BZY | 4 | 80/134 | <2.20 × 10−16 |
| GLA | Alpha-galactosidase | Fabry disease | 1R46 | 2 | 177/213 | <2.20 × 10−16 |
| PKLR | Human erythrocyte pyruvate kinase | Hemolytic anemia | 2VGB | 4 | 108/409 | 5.33 × 10−7 |
| PAH | Phenylalanine hydroxylase | Phenylketonuria | 2PAH | 4 | 185/144 | <2.20 × 10−16 |
| ARSB | Arylsulfatase B | Mucopolysaccharidosis VI | 1FSU | 1 | 70/404 | <2.20 × 10−16 |
| OTC | Ornithine carbamoyltransferase | Hyperammonemia | 10TH | 3 | 148/173 | <2.20 × 10−16 |
| CD40LG | CD40 ligand | Hyper-IgM syndrome | 1ALY | 3 | 35/111 | 3.00 × 10−15 |
| UROD | Uroporphyrinogen decarboxylase | Porphyria | 1URO | 2 | 51/306 | 1.84 × 10−13 |
| PRNP | Prion protein | Various prion diseases | 1I4M | 2 | 25/83 | 0.474 |
| GCH1 | GTP cyclohydrolase I | Dopa-responsive dystonia | 1FB1 | 10 | 57/139 | 9.31 × 10−13 |
| CYB5R3 | Cytochrome b5 reductase 3 | Methemoglobinemia | 1UMK | 1 | 26/245 | 6.06 × 10−7 |
| OAT | Ornithine aminotransferase | Gyrate atrophy | 10AT | 2 | 30/374 | 1.29 × 10−7 |
| GUSB | Beta-glucuronidase | Mucopolysaccharidosis VII | 1BHG | 4 | 34/577 | 0.0790 |
| PDHA1 | Pyruvate dehydrogenase (alpha subunit) | Lactic acidosis | 1NI4 | 4 | 46/315 | 8.10 × 10−6 |
| PAX6 | Paired box 6 | Aniridia | 6PAX | 1 | 38/95 | 1.54 × 10−6 |
| LMNA | Lamin A/C (globular domain) | Muscular dystrophy; cardiomyopathy | 1IFR | 1 | 25/88 | 1.39 × 10−6 |
Includes only positions at which amino acid substitutions were introduced for ΔΔG calculations in this study (excludes cysteines and any missing residues in crystal structures)
As determined by a two-sample Kolmogorov-Smirnoff test comparing the complete sets of calculated ΔΔG values for substitutions at positions of known disease mutations (“pathological”) to substitutions at sites without identified pathological mutations (“neutral”)
Despite a lack of evidence for involvement of misfolding and/or aggregation of disease-linked mutant proteins in many cases, the majority (18/20) of proteins assayed exhibit a statistically significant shift toward more positive ΔΔG values (more destabilizing) for substitutions at sites of disease-associated missense mutations, as compared to those for mutations at non-disease-associated (“neutral”) sites (Table 1; Fig. 1). While a majority of proteins analyzed show a significantly increased propensity for global protein destabilization for substitutions at disease-associated positions (with p-values on the order of 10−13 to 10−16), prion and beta-glucuronidase proteins exhibit no significant differences in their distributions of calculated ΔΔG values (when comparing substitutions at disease-linked and neutral amino acid positions). It is possible that pathogenic substitutions in prion and beta-glucuronidase proteins do not influence the overall stability of the proteins, but disrupt physiological functions by perturbing protein dynamics or interactions with other macromolecules (Sahni et al. 2015).
Fig. 1.
Amino acid substitutions at sites of known disease-linked missense mutations are more destabilizing than those at neutral positions. Histograms show normalized counts of ΔΔG values calculated for all possible substitutions (excluding those involving cysteine) at sites of known disease-linked substitutions (red curves) and all other “neutral” amino acid positions (black curves). Bar graphs show the normalized frequencies of ΔΔG values exceeding n × 10 kcal/mol, where n is the number of subunits in the biological oligomeric assembly of each protein (Color figure online)
Using the ΔΔG values calculated as described above, we next evaluated in more detail the vulnerability of individual secondary structure elements of Cu/Zn superoxide dismutase (SOD1) to destabilizing amino acid substitutions (Khare et al. 2006). Sites of ALS-linked mutations in the gene encoding SOD1 are distributed throughout the protein sequence (Redler and Dokholyan 2012), with the exception of a stretch of residues encompassing the third β-strand (Fig. 2a). We hypothesize that the striking absence of pathogenic substitutions identified in this region could be explained by an increased tolerance to amino acid substitutions. If substitutions in β3 are more likely to be neutral or protective (having a distribution shifted toward more negative ΔΔG values), then individuals with these polymorphisms would not be expected to present with ALS. Alternatively, if β3 substitutions tend to substantially destabilize SOD1 (more positive ΔΔG values) to the point that expression of the mutant protein is potently toxic, such substitutions may have never been identified in individuals due to embryonic or early postnatal lethality. To explore this possibility, we compare the ΔΔG distributions of SOD1’s individual β-strands to each other and to the distributions for “pathogenic” and “neutral” amino acid positions in SOD1. We find that the distribution of calculated ΔΔG values is substantially shifted toward more negative values (more stabilizing) for β3 (gray curve, Fig. 2b) compared to all other individual β-strands, as well as to the sets of pathogenic and neutral positions. Our results suggest that the absence of ALS-associated mutations in β3 is due to this region’s exceptional tolerance to amino acid substitution.
Fig. 2.
Lower vulnerability to destabilization by amino acid substitution may explain the lack of ALS-linked missense mutations in β3 of SOD1. a Amino acid sequence of SOD1. Sites of ALS-causative missense mutations are shown in red and arrows above the sequence mark β-strand regions. b Secondary structure of SOD1 (PDBID: 1SPD). c Histograms representing calculated DDG values of mutations in neutral and pathogenic sites (as shown in Fig. 1) and in positions comprising each bstrand. d Bar graphs showing the normalized frequencies of DDG values exceeding 20 kcal/mol for neutral, pathogenic, and β-strand amino acid positions (Color figure online)
Based on this exhaustive survey, we conclude that missense mutations linked to genetic disorders are significantly more likely to occur at positions that perturb the structural stability of native proteins, even when their associated phenotypes do not include observable protein destabilization or aggregation. Prior work probing the link between stability change and pathogenicity of missense mutations has focused on characterization of relatively small sets of disease-linked amino acid substitutions (rather than evaluating all possible substitutions), or comprehensive in silico mutagenesis for a single protein (Stefl et al. 2013; Yin et al. 2007a, b). For example, Sahni et al. (2015) recently evaluated the effect of disease-associated missense mutations on overall protein stability and the robustness of interactions with native binding partners, concluding that a minority of disease-linked substitutions lead to a significant overall decrease in protein stability. Rather, they report that pathogenic substitutions are more likely to disrupt a protein’s interactions with its native binding partners. In assessing protein products of missense mutations more comprehensively, we find evidence for a widespread vulnerability of disease-associated amino acid positions to destabilizing substitutions.
Most proteins are marginally stable (Dokholyan and Shakhnovich 2001; Dokholyan 2008; Williams et al. 2006) in their functional forms and mutations linked to inherited human disorders exert their toxic effects through a variety of mechanisms, including disruption of the native folding behavior of the affected gene’s protein product and concomitant loss of function or novel toxic properties. Our findings, in concordance with previous work, are consistent with the idea that even small reductions in protein stability can lead to dysfunctional proteins associated with human disease, and that disease-linked missense mutations are enriched in regions of protein sequence with highest vulnerability to destabilization. On the other hand, studies of protein evolution have shown that proteins can tolerate many amino acid substitutions, including substitutions at highly conserved regions, by introducing compensatory mutations to counterbalance the effects of deleterious mutations; this phenomenon may explain the fixation of human disease-associated amino acids as wild type residues in orthologous proteins of other species (Xu and Zhang 2014).
All these findings point to the fact that the mutational landscape of proteins is exceedingly complex. Understanding of the biophysical underpinnings of selection for stable, correctly folded, and functional gene products is still lacking (Depristo et al. 2005). Our work reveals a widespread vulnerability of sites of disease-associated mutations to destabilizing substitutions, even when reduced protein stability and/or aggregation are not characteristic of the disease state. Furthermore, the methodology employed provides a platform for the rational control of protein stability through mutagenesis, which could be useful in refining protein evolution models and improving prediction algorithms.
Methods
We employ the Eris suite (Yin et al. 2007a, b; Ding and Dokholyan 2006) (a protein stability evaluation software) to probe the effects of all possible amino acid substitutions in 20 proteins with multiple identified pathogenic missense mutations. Upon substitution, Eris algorithms re-pack the side chains of the residues surrounding the mutated residue using a Monte Carlo simulated annealing procedure. The change of protein stability induced by the mutations to the wild type protein is calculated in terms of ΔΔG (ΔΔG = ΔGmutant − ΔGwild type), utilizing the Medusa force field. We calculate ΔΔG for all possible substitutions at all amino acids included in the crystal structure, excluding substitutions to and from cysteine (i.e., for each amino acid in a given structure, ΔΔG is calculated for 18 non-native variants). We classify amino acid positions as disease-associated if these sites contain at least one pathogenic missense mutation, as documented in the Human Gene Mutation Database (www.hgmd.cf.ac.uk). The differences between sets of all calculated ΔΔG values for disease-associated and neutral amino acid positions are evaluated for significance using the Kolmogorov–Smirnov test (Press et al. 2007).
Acknowledgments
This work was supported by the National Institutes of Health grant R01GM080742 to N.V.D. R.L.R. was supported by the National Institutes of Health Predoctoral Fellowship F31NS073435 from the National Institute of Neurological Disorders and Stroke. We thank Michael Caplow and Feng Ding for helpful discussions regarding study design.
Footnotes
Compliance with Ethical Standards
Research Involving Human and Animal Rights This study does not involve research with humans and/or animals and it follows all the ethical standards.
Conflict of interest The author declares that there is no conflict of interests.
References
- Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Meth. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bao L, Zhou M, Cui Y. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucl Acids Res. 2005;33:W480–W482. doi: 10.1093/nar/gki372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. Proc Natl Acad Sci USA. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucl Acids Res. 2007;35:3823–3835. doi: 10.1093/nar/gkm238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat. 2009;30:1237–1244. doi: 10.1002/humu.21047. [DOI] [PubMed] [Google Scholar]
- Capriotti E, Calabrese R, Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics. 2006;22:2729–2734. doi: 10.1093/bioinformatics/btl423. [DOI] [PubMed] [Google Scholar]
- Depristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet. 2005;6:678–687. doi: 10.1038/nrg1672. [DOI] [PubMed] [Google Scholar]
- Ding F, Dokholyan NV. Emergence of protein fold families through rational design. PLoS Comput Biol. 2006;2:e85. doi: 10.1371/journal.pcbi.0020085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dokholyan NV. Structural Bioinformatics. 2nd edn. Hoboken: Wiley; 2008. Protein designability and engineering; pp. 961–982. [Google Scholar]
- Dokholyan NV, Shakhnovich EI. Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001;312:289–307. doi: 10.1006/jmbi.2001.4949. [DOI] [PubMed] [Google Scholar]
- Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134:341–352. doi: 10.1016/j.cell.2008.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002;320:369–387. doi: 10.1016/S0022-2836(02)00442-4. [DOI] [PubMed] [Google Scholar]
- Khare SD, Caplow M, Dokholyan NV. FALS mutations in Cu, Zn superoxide dismutase destabilize the dimer and increase dimer dissociation propensity: a large-scale thermodynamic analysis. Amyloid. 2006;13(4):226–235. doi: 10.1080/13506120600960486. [DOI] [PubMed] [Google Scholar]
- Li B, et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics. 2009;25:2744–2750. doi: 10.1093/bioinformatics/btp528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucl Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical recipes: the art of scientific computing. 3rd edn. New York: Cambridge University Press; 2007. [Google Scholar]
- Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002;30:3894–3900. doi: 10.1093/nar/gkf493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Redler R, Dokholyan NV. The complex molecular biology of amyotrophic lateral sclerosis (ALS) Progr in Molec Biol and Transl Sci. 2012;107:215–262. doi: 10.1016/B978-0-12-385883-2.00002-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sahni N, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell. 2015;161:647–660. doi: 10.1016/j.cell.2015.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E. Molecular mechanisms of disease-causing missense mutations. J Mol Biol. 2013;425:3919–3936. doi: 10.1016/j.jmb.2013.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas PD, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13:2129–2141. doi: 10.1101/gr.772403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Z, Moult J. SNPs, protein structure, and disease. Hum Mutat. 2001;17:263–270. doi: 10.1002/humu.22. [DOI] [PubMed] [Google Scholar]
- Williams PD, Pollock DD, Goldstein RA. Functionality and the evolution of marginal stability in proteins: inferences from lattice simulations. Evol Bioinform Online. 2006;2:91–101. [PMC free article] [PubMed] [Google Scholar]
- Xu J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol. 2014;31:1787–1792. doi: 10.1093/molbev/msu130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin S, Ding F, Dokholyan NV. Eris: an automated estimator of protein stability. Nat Meth. 2007a;4:466–467. doi: 10.1038/nmeth0607-466. [DOI] [PubMed] [Google Scholar]
- Yin S, Ding F, Dokholyan NV. Modeling backbone flexibility improves protein stability estimation. Structure. 2007b;15:1567–1576. doi: 10.1016/j.str.2007.09.024. [DOI] [PubMed] [Google Scholar]
- Yue P, Li Z, Moult J. Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol. 2005;353:459–473. doi: 10.1016/j.jmb.2005.08.020. [DOI] [PubMed] [Google Scholar]
- Zeldovich KB, Chen P, Shakhnovich EI. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci USA. 2007;104:16152–16157. doi: 10.1073/pnas.0705366104. [DOI] [PMC free article] [PubMed] [Google Scholar]


