Abstract
Understanding and predicting the changes of protein structure and function upon mutation and their relationship to human health is a critical element to translate the genomic revolution into actionable interventions. Therefore, it is pertinent to explore how mutations result in structural changes leading to pathogenic proteins, but due to the protein structural knowledge gap, experimental approaches are lacking. Protein structure prediction methods, such as I-TASSER, have made it possible to predict the structure of a given amino acid sequence, thus opening a new way to explore protein structure changes upon mutations when experimental information is not available. Using known mutations from the Catalogue of Somatic Mutation in Cancer (COSMIC) and ClinVar databases, we compare predicted structure-derived properties from wild type (WT) and mutated proteins and find differences between the local and global 3D protein structures of the WT and the mutants. The studies in this relatively small sample reveal that the structural changes are quite diverse.
Keywords: Oncogenic missense variants, Protein structure prediction, Structural classification
Oncogenic Missense Variants; Protein Structure Prediction, Structural Classification
1. Introduction
Understanding and predicting the changes of protein structure and function upon mutation and their relationship to human health is a critical task to translate the genomic revolution into actionable interventions [1]. However, since the effects of mutations on structural properties of proteins are multifactorial, it is unclear which features of proteins structural changes are important in leading to pathogenicity [2]. Moreover, because there is very limited experimental structural data for proteins in general, and much less for the mutated species [3, 4], very little is known about how protein structures and their properties change upon mutation [5, 6]. It is also poorly understood if these changes can predict pathogenicity or provide ways to develop mechanistic hypotheses of pathogenicity. Using computational methods to study this problem appears to be a viable manner to explore this. This approach is also strongly supported by recent advances in computational resources and methods to predict 3D protein structures [7]. While a great deal of progress has been made in predicting variant pathogenicity [1, 8, 9, 10], the most accurate methods do not provide insights on the structural changes leading to the alteration of the protein functionality. We have demonstrated that careful comparison between wild type (WT) and mutated structures can provide insights on pathogenicity [11, 12], but general principles remain elusive. In this paper, we explore the structural changes induced by 26 known oncogenic missense mutations from the Catalogue of Somatic Mutation in Cancer (COSMIC) [13] and ClinVar [14] databases. We explored the questions proposed above by comparing the predicted 3D structures of WT and mutated proteins. We visualized the changes in the structures upon mutation, analyzing both overall and local structural changes due to the mutations. We also extracted structural features from both the 3D predicted structures and sequences for the wild types (WT) and mutants. Lastly, we analyzed them for statistical significance, and by using unsupervised machine learning, we attempted to identify common structural changes leading to oncogenesis.
2. Methods
A set of oncogenic substitution missense mutations were selected based on their classification found in COSMIC [13] and ClinVar [14] databases. While the selection of these proteins is somehow arbitrary, the selected oncogenic proteins represent a wide range of size and function while keeping its number reasonablly low to allow computational times within achievable limits. The complete description of the variants and their WTs is shown in Table 1. FASTA files for each of the WT sequences were retrieved from UniProt [15]; the mutated sequences were obtained manually by modifying the WTs according to the missense mutation annotations from the COSMIC [13] and ClinVar [14] databases. To consider ensemble effects [16], five structures corresponding to the centroids of structure clusters computed by I-TASSER [17, 18, 19] were produced per run, and two runs were performed per protein, which were retained for analysis. This approach was used because previous work [14] showed that by performing more than one I-TASSER prediction, it is possible to explore a larger conformational space, which could be more representative of the actual structures in living organisms and provide a larger data set for the clustering analysis attempted in this paper. In the interest of having structural variability in the dataset used for the clustering analysis and to avoid I-TASSER's tendency to produce consistently similar structures despite mutation, the structure predictions were performed using both default settings for one set and duplicates generated using homolog exclusion (>30% similarity). All the structure predictions were performed using the University of Utah CHPC (https://www.chpc.utah.edu/) clusters using nodes with 12 cores with 2.8 GHz Intel Xeon (Westmere X5660) processors, 24 GB memory, and Mellanox QDR Infiniband interconnect on the Ember cluster. The visual analysis and manipulation of the structures was done using 3D structures using Chimera, [20]. The features used in the classification work (see Table 2) were extracted from the predicted 3D structures using DSSP [21], I-TASSER [17, 18, 19], TM-align [22], FoldX [23], RW and RWPlus [24, 25, 26], DFIRE and dDFIRE [25], for calculating solvent accessible surface area (SASA), C scores, TM (template modeling) scores and RMSD (between reference wild types and mutant), total Gibbs free energy, and Pair-wise distance-dependent energy, side-chain orientation-dependent energy, distance-related energy, and angle-related energy, respectively. RW, RWPlus, DFIRE, and dDFIRE calculated for both wild type and mutant predicted structures were previously used in the STRUM stability change predictor by Quan et al. as structure-derived features [27]. Statistical testing (Mann-Whitney-Wilcoxon) was performed on the dataset [28], which was separated into wild type and mutant features that represented each protein structure, and p-value correction (Benjamini-Hochberg) was applied [29]. A clustering analysis (k-means clustering) was subsequently performed to evaluate the structure of the dataset using Rstudio [30].
Table 1.
Protein | UniProt ID | PDB ID | Clinvar or COSMIC ID | AA Change | Associated Cancer |
---|---|---|---|---|---|
CADH1 | P12830 | 2O72 | |||
CADH1, Mutant 1 | COSM19822 | Ala634Val | Breast Carcinoma | ||
CADH1, Mutant 2 | NM_004360.4(CDH1):c.1008G > T (p.Glu336Asp) | Glu336Asp | Hereditary Diffuse Gastric Cancer | ||
CBL | P22681 | 2Y1M | |||
CBL, Mutant 1 | NM_005188.3(CBL):c.1186T > C (p.Cys396Arg) | Cys396Arg | Noonan syndrome-like disorder with juvenile myelomonocytic leukemia, Rasopathy | ||
CBL, Mutant 2 | COSM4385831 | Gln367Pro | Genital germ cell tumor | ||
CBL, Mutant 3 | COSM34052 | Tyr371His | Acute myeloid leukemia, juvenile myelomonocytic leukemia, chronic myelomonocytic leukemia | ||
RET | P07949 | 4CKJ | |||
RET, Mutant 1 | NM_020975.4(RET):c.1465G > A (p.Asp489Asn) | Asp489Asn | Multiple endocrine neoplasia, type 2, not specified, Hereditary cancer-predisposing syndrome, Hirschsprung disease | ||
RET, Mutant 2 | COSM967 | Cys609Tyr | Pheochromocytoma | ||
RET, Mutant 3 | COSM87267 | Cys618Ser | Carcinoma of the thyroid | ||
RET, Mutant 4 | COSM966 | Cys634Arg | Pheochromocytoma, Carcinoma of the thyroid | ||
RET, Mutant 5 | NM_020975.4(RET):c.1336G > C (p.Gly446Arg) | Gly446Arg | Multiple endocrine neoplasia, type 2 | ||
RET, Mutant 6 | COSM1666596 | Gly691Ser | Acute myeloid leukemia | ||
VHL | P40337 | 4WQO | |||
VHL, Mutant 1 | COSM14400 | Ser65Leu | Clear cell renal cell carcinoma | ||
VHL, Mutant 2 | NM_000551.3(VHL):c.292T > C (p.Tyr98His) | Tyr98His | Von Hippel Lindau syndrome | ||
TP53 | P40337 | 3Q01 | |||
TP53, Mutant 1 | COSM11066 | His193Leu | Squamous cell carcinoma of the esophagus | ||
NF2 | P35240 | 1H4R | |||
NF2, Mutant 1 | COSM23876 | Glu463Lys | Neurofibromatosis Type 2 | ||
RB1 | P06400 | 4ELJ | |||
RB1, Mutant 1 | COSM1636647 | Ser634Pro | Retinoblastoma | ||
CALR | P27797 | 3POW | |||
CALR, Mutant 1 | COSM1290873 | Phe46Tyr | Lymphoid Neoplasms | ||
FANCF | Q9NPI8 | 2IQC | |||
FANCF, Mutant 1 | COSM4521389 | Gly370Asp | Squamous cell carcinoma of head and neck | ||
MUTYH | Q9UIF7 | 3N5N | |||
MUTYH, Mutant 1 | COSM1645292 | Arg200Cys | Carcinoma of Colon | ||
HRAS | P01112 | 3K8Y | |||
HRAS, Mutant 1 | COSM490 | Gly13Asp | Chronic Myelogenous Leukemia | ||
PPP2R1A | P30153 | 1B3U | |||
PPP2R1A, Mutant 1 | COSM51253 | Arg183Gln | Endometrial Carcinoma | ||
NT5C2 | P49902 | 2J2C | |||
NT5C2, Mutant 1 | COSM4011341 | Arg413His | Gastric Adenocarcinoma | ||
FH | P07954 | 3E04 | |||
FH, Mutant 1 | COSM906408 | Asp179Asn | Colonic Adenocarcinoma | ||
MAP2K1 | Q02750 | 3EQI | |||
MAP2K1, Mutant 1 | COSM1235478 | Lys57Asn | Pulmonary Adenocarcinoma | ||
PIK3CA | P42336 | 4TV3 | |||
PIK3CA, Mutant 1 | COSM1041443 | Met1Val | Glioma | ||
MAP2K4 | P45985 | 3ALN | |||
MAP2K4, Mutant 1 | COSM137092 | Arg134Trp | Colonic Adenocarcinoma |
Table 2.
Feature | Description | Software used |
---|---|---|
RMSD | Root mean square deviation of atomic position between reference and predicted structures for WT and mutated structures respectively. | UCSF Chimera [20] |
Cscore | Confidence score from structure prediction | I-TASSER [17, 18, 19] |
TMscore | Template modeling score | TMAlign [22] |
SASA | Solvent accessible surface area of protein structure | DSSP [21] |
Energy (kcal/mol) | Total stability energy | FoldX [23] |
calRW (kcal/mol) | Pair-wise distance-dependent energy | calRW [26] |
calRWPlus (kcal/mol) | Side-chain orientation-dependent energy | calRWPlus [26] |
DFIRE2 | Distance-related energy | DFIRE2 [25] |
dDFIRE | Angle-related energy | dDFIRE [25] |
3. Results and discussion
Figures 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17 depict the comparison of the overall and local 3D predicted structures for a representative I-TASSER models for the WT and the different mutations from Table 1. While in some cases experimental structures are available for only short portions of the sequence, we are reasonably confident in the structure predictions and results for this study for two reasons: our structure prediction method, i.e, I-TASSER, and using 10 predicted structures per entry. Since I-TASSER uses template-based threading as part of its structure prediction, even if the proteins have not been fully experimentally resolved, we can rely on homologous structures to make reasonable predictions for the missing fragments. Furthermore, if no templates can be found, these are assembled using ab initio folding. Since we performed 10 structure predictions for each entry, we can average out biased predictions in order to make more reasonable comparisons across the features we studied and conformations we visualized. Taken together, we are reasonably confident that the structural analysis has been performed correctly.
In all cases, the figures depict the structures showing the maximum overlap among them. The discussion of the most remarkable features observed is provided for convenience in the Figure captions. All the gene descriptions were taken from GeneCards [31]. It is important to realize that the observations reported here are quite generic and do not attempt to provide detailed insight on the structural changes observed for each protein, but to extract overall observations leading towards identifying key (if any) structural factors that may determine pathogenicity. All structures depicted in the figures were obtained using I-TASSER and they are available at: http://home.chpc.utah.edu/∼u0033399/Protein%20Structural%20Changes%20for%20Oncogenic%20Missense%20Variants/.
From the discussions in Figures 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17, it is difficult to extract any significant trends that associate changes in the 3D structures with pathogenicity. We observe cases in which there are no changes in either the overall or the local structures (TP53, CALR, FANCF, MUTYH, PPP2R1A, FH, MAP2K1, MAP2K4), cases in which there are noticeable changes in the overall structure, but not in the local one (CADH1, PIK3CA), cases in which these are no noticeable changes in the overall structure but relevant ones in the local one (RB1, HRAS, NT5C2) and finally examples in which both the local and overall structures are changed by the mutation (CBL, RET). Even for the case in which no noticeable changes have been noticed in this analysis, we can't conclude that these minor changes in geometry and/or electrostatics may not play a critical role in defining the pathogenicity of the variant [11, 12].
Table 3 presents the results of comparing the distributions of the structural features in Table 2 for the WT and mutated 3D predicted structures. The Mann-Whitney-Wilcoxon [28] test examines whether two samples are likely to derive from the same population taking as the null hypothesis that the distributions are equal. Therefore, a low p-value after corrections, here p = 0.05, is interpreted as that the feature may be used to distinguish between WT and mutated protein structures. It is apparent that this includes most of the structural properties in Table 2 except for the root mean square deviation of the atomic position between WT and mutated and the Template Modeling (TM) score. Thus the remaining features that were tested for statistical significance lead to the possibility that they could be used to classify the structures using, for instance, clustering analysis.
Table 3.
Predicted Structural Feature | P-value After Correction (BH) |
---|---|
Root mean square deviation of atomic position | 5.815e-01 |
Confidence score | 2.032e-02 |
Template modeling Score | 5.6e-01 |
Solvent accessible surface area | 1.6e-05 |
Total Gibbs free energy | 2.03e-02 |
Pairwise distance-dependent energy | 2.56e-02 |
Sidechain orientation-dependent energy | 2.56e-02 |
Distance-related energy | 2.56e-02 |
Angle-related energy | 4.78e-02 |
A cluster density calculation (within-cluster sum of squares) was performed on the dataset, and it was found that k = 2 clusters were the optimum number to use in the clustering analysis. K-means (k = 2) clustering was performed on the dataset, which yielded a visible separation when visualized (Figure 18). However, a purity analysis of the clusters showed that the clustering assignments of the WT and mutant proteins were not separated into the 2 clusters. Analysis of the principal components (Figure 18) indicates that the first dimension by which the clusters are divided by the solvent accessible surface area (SASA). This suggests that the proteins studied here can be classified into those that are relatively extended and those that are more compact, but that this classification is not changed by the oncogenic mutations considered here. This is consistent with the observations derived from Figures 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17, which do not show any consistent changes in the overall protein structures upon mutation. Furthermore, we performed density cluster calculations consecutively eliminating the solvent accessible surface area of protein structure (SASA), pair-wise distance-dependent energy (calRW), side-chain orientation-dependent energy (calRWPlus) and total stability energy (FoldX). This order of elimination was dictated by choosing to eliminate the most distinctive component of the most significant contribution to the first principal component (see Figures S1-S4). In all cases, we observe good clustering according to the first principal component, but the clustering also does not discriminate (purity analysis) between the WT and the mutated structures. No differences were noticed between the clustering properties of structures calculated with full or partial template inclusions. It was again showing consistency with the lack of clear trends defining the structural changes upon mutation for this set of oncogenic proteins. The RMSD values that were calculated by comparing our wild type and mutant structures to their respective reference structures. Here they are used as an internal control of the quality of the structure predicted by I-TASSER because small RMSD between reference structure and the other four predicted structures is considered as an indication of a reliable prediction, but they do not differentiate WT from mutated structures.
4. Conclusions
Based on the above findings, there were several significant features that could be useful for distinguishing predicted WT and mutated protein structures, but the clustering analyses showed that there were no discernible differences in structural properties that could be found using an unsupervised method, i.e., k-means clustering. This is consistent with the observations from Figures 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17, which do not show any consistent changes in the overall protein structures upon mutation, precluding any further statistical analysis. While we did not find overall principles that can be used to classify WT and oncogenic variables in this dataset, the judicious use of 3D structure prediction methods remains a valuable tool to understand oncogenic mutations further, as demonstrated previously [11, 12, 32, 33, 34, 35]. We recognize that comparing these results with those that could be obtained from neutral non-pathogenic variants could provide more details on the problem and perhaps could help in differentiating the structural effects in different types of variants. Additionally, we also seek to highlight that due to the limited availability of experimental structures that still should be considered the gold standard, the use of 3D structure prediction to understand molecular mechanisms of pathogenicity, in general, is a necessary albeit difficult task due to difficulties in experimental determination of mutated structures and the large volume of reported protein-coding sequences. In our study, we have had to understand that although we found statistically-significant differences in the predicted features of our dataset between wild type structures and those affected by oncogenic mutations, these changes were not conclusively detected in our other analyses. This could be due to the variety in size and function of proteins in our dataset, which could have contributed to inconclusive clustering analyses despite finding these statistically-significant predicted structural features. Therefore, in future work, the possibility of finding higher-order structural features that can be used to classify WT and oncogenic protein structures could be explored using a much larger dataset, which includes nonpathogenic variants, more structural features, and supervised machine learning algorithms, e.g. Random Forest, for binary classification, i.e., WT/mutant classification.
Declarations
Author contribution statement
Rolando Hernandez: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Wrote the paper.
Julio C. Facelli: Conceived and designed the experiments; Wrote the paper.
Funding statement
This work was supported by the Utah Center for Clinical and Translational Science funded by NCATS award 1ULTR002538 and the NLM Training grant T15 LM00712418.
Data availability statement
No data was used for the research described in the article.
Competing interest statement
The authors declare no conflict of interest.
Additional information
No additional information is available for this paper.
Acknowledgements
Computer resources were provided by the University of Utah Center for High-Performance Computing, which has been partially funded by the NIH Shared Instrumentation Grant 1S10OD02164401A1.
Appendix A. Supplementary data
The following is the supplementary data related to this article:
References
- 1.Kircher M., Witten D.M., Jain P., O'Roak B.J., Cooper G.M., Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014;46:310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Reva B., Antipin Y., Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39:e118. doi: 10.1093/nar/gkr407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H. The protein data bank. Nucleic Acids Res. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34(Database issue):D187–D191. doi: 10.1093/nar/gkj161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Arodź T., Płonka P.M. Effects of point mutations on protein structure are nonexponentially distributed. Proteins: Struct. Funct. Bioinform. 2012;80(7):1780–1790. doi: 10.1002/prot.24073. [DOI] [PubMed] [Google Scholar]
- 6.Zhang C. Iowa State University; 2016. Protein Wild-type and Mutant Ensemble Database. [Google Scholar]
- 7.Kryshtafovych A., Schwede T., Topf M., Fidelis K., Moult J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins: Struct. Funct. Bioinform. 2019;87(12):1011–1020. doi: 10.1002/prot.25823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pejaver V., Urresti J., Lugo-Martinez J., Pagel K.A., Lin G.N., Nam H.-J. MutPred2: inferring the molecular and phenotypic impact of amino acid variants. bioRxiv. 2017:134981. doi: 10.1038/s41467-020-19669-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rogers M.F., Shihab H.A., Mort M., Cooper D.N., Gaunt T.R., Campbell C. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features. Bioinformatics. 2017;34(3):511–513. doi: 10.1093/bioinformatics/btx536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Quan L., Lv Q., Zhang Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics. 2016;32(19):2936–2946. doi: 10.1093/bioinformatics/btw361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Teerlink C.C., Huff C., Stevens J., Yu Y., Holmen S.L., Silvis M.R. A nonsynonymous variant in the GOLM1 gene in cutaneous malignant melanoma. JNCI: J. Natl. Cancer Inst. 2018;110(12):1380–1385. doi: 10.1093/jnci/djy058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li C., Liu T., Liu B., Hernandez R., Facelli J.C., Grossman D. A novel CDKN2A variant (p16L117P) in a patient with familial and multiple primary melanomas. Pigm. Cell Melanoma Res. 2019;32(5):734–738. doi: 10.1111/pcmr.12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bamford S., Dawson E., Forbes S., Clements J., Pettett R., Dogan A. The COSMIC (Catalogue of somatic mutations in cancer) database and website. Br. J. Canc. 2004;91(2):355–358. doi: 10.1038/sj.bjc.6601894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980–D985. doi: 10.1093/nar/gkt1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Consortium T.U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2018;47(D1):D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wen J., Scoles D.R., Facelli J.C. Structure prediction of polyglutamine disease proteins: comparison of methods. BMC Bioinf. 2014;15 doi: 10.1186/1471-2105-15-S7-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Roy A., Kucukural A., Zhang Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 2010;5(4):725–738. doi: 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang J., Yan R., Roy A., Xu D., Poisson J., Zhang Y. The I-TASSER Suite: protein structure and function prediction. Nat. Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinf. 2008;9:40. doi: 10.1186/1471-2105-9-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pettersen E.F., Goddard T.D., Huang C.C., Couch G.S., Greenblatt D.M., Meng E.C. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
- 21.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 22.Zhang Y., Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33(7):2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schymkowitz J., Borg J., Stricher F., Nys R., Rousseau F., Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33(SUPPL. 2):W382–W388. doi: 10.1093/nar/gki387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yang Y., Zhou Y. Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Sci. 2008;17(7):1212–1219. doi: 10.1110/ps.033480.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yang Y., Zhou Y. Specific interactions for ab initio folding of protein terminal regions with secondary structures. Proteins: Struct. Funct. Bioinform. 2008;72(2):793–803. doi: 10.1002/prot.21968. [DOI] [PubMed] [Google Scholar]
- 26.Zhang J., Zhang Y. A novel side-chain orientation dependent potential derived from random-walk reference state for protein fold selection and structure prediction. PloS One. 2010;5(10) doi: 10.1371/journal.pone.0015386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Quan L., Lv Q., Zhang Y.J.B. STRUM: structure-based prediction of protein stability changes upon single-point mutation. 2016;32(19):2936–2946. doi: 10.1093/bioinformatics/btw361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 1947;18(1):50–60. [Google Scholar]
- 29.Haynes W. Benjamini–hochberg method. In: Dubitzky W., Wolkenhauer O., Cho K.-H., Yokota H., editors. Encyclopedia of Systems Biology. Springer New York; New York, NY: 2013. p. 78. [Google Scholar]
- 30.Allaire J. RStudio: integrated development environment for R. Boston, MA. 2012;770:394. [Google Scholar]
- 31.Safran M., Dalah I., Alexander J., Rosen N., Iny Stein T., Shmoish M. GeneCards Version 3: the human gene integrator. Database (Oxford) 2010;2010:baq020–baq. doi: 10.1093/database/baq020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Gerasimavicius L., Liu X., Marsh J.A. Identification of pathogenic missense mutations using protein stability predictors. Sci. Rep. 2020;10(1):15387. doi: 10.1038/s41598-020-72404-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Casadio R., Vassura M., Tiwari S., Fariselli P., Luigi Martelli P. Correlating disease-related mutations to their effect on protein stability: a large-scale analysis of the human proteome. Hum. Mutat. 2011;32(10):1161–1170. doi: 10.1002/humu.21555. [DOI] [PubMed] [Google Scholar]
- 34.Nielsen S.V., Stein A., Dinitzen A.B., Papaleo E., Tatham M.H., Poulsen E.G. Predicting the impact of Lynch syndrome-causing missense mutations from structural calculations. PLoS Genet. 2017;13(4) doi: 10.1371/journal.pgen.1006739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pey A.L., Stricher F., Serrano L., Martinez A. Predicted effects of missense mutations on native-state stability account for phenotypic outcome in phenylketonuria, a paradigm of misfolding diseases. Am. J. Hum. Genet. 2007;81(5):1006–1024. doi: 10.1086/521879. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
No data was used for the research described in the article.