Abstract
In 2019, we released Missense3D which identifies stereochemical features that are disrupted by a missense variant, such as introducing a buried charge. Missense3D analyses the effect of a missense variant on a single structure and thus may fail to identify as damaging surface variants disrupting a protein interface i.e., a protein–protein interaction (PPI) site. Here we present Missense3D-PPI designed to predict missense variants at PPI interfaces. Our development dataset comprised of 1,279 missense variants (pathogenic n = 733, benign n = 546) in 434 proteins and 545 experimental structures of PPI complexes. Benchmarking of Missense3D-PPI was performed after dividing the dataset in training (320 benign and 320 pathogenic variants) and testing (226 benign and 413 pathogenic). Structural features affecting PPI, such as disruption of interchain bonds and introduction of unbalanced charged interface residues, were analysed to assess the impact of the variant at PPI. The performance of Missense3D-PPI was superior to that of Missense3D: sensitivity 44 % versus 8% and accuracy 58% versus 40%, p = 4.23 × 10−16. However, the specificity of Missense3D-PPI was lower compared to Missense3D (84% versus 98%). On our dataset, Missense3D-PPI’s accuracy was superior to BeAtMuSiC (p = 3.4 × 10−5), mCSM-PPI2 (p = 1.5 × 10−12) and MutaBind2 (p = 0.0025). Missense3D-PPI represents a valuable tool for predicting the structural effect of missense variants on biological protein networks and is available at the Missense3D web portal (http://missense3d.bc.ic.ac.uk).
Keywords: variants, prediction, protein structure, protein-protein interaction, disease
Introduction
Residues on the protein surface that are not directly involved in function generally are more tolerant to amino acid substitutions compared to residues affecting the buried core of a protein.1 However, interface residues involved in protein–protein interaction (PPI), also known as interface residues, are an exception to this principle. As previously demonstrated by our group and others,1,2 interface residues are enriched in disease-causing amino acid substitutions. The damaging effect of variants affecting protein interaction sites is difficult to predict and the majority of in silico variant prediction methods perform worst on variants located at the interface compared to the remaining protein surface or the buried interior protein area.3
Genetic variants causing the disruption of protein interfaces are an important contributor to human disease.4,5 These variants generally preserve the C. Pennica, G. Hanna, S.A. Islam, et al. Journal of Molecular Biology 435 (2023) 168060 folding and stability of the monomeric protein but may impair its function by impacting on the many biological processes which rely on protein interaction, such as trafficking and signalling. Identification and prediction of the effect of a variant on PPI requires knowledge of the residues forming a protein interface. In recent years there has been an increase in the availability of three-dimensional structures of protein complexes, both experimentally solved and obtained from protein docking and homology modelling.6 These 3D coordinates are publicly available from databases, such as PDB,7 Interactome3D,8 GWYRE9 and PrePPI.10 Although at present the coverage of the protein interactome remains limited, we can expect an exponential increase in 3D coordinates of PPI complexes in the coming years as a result of the recent break-through in protein modelling achieved by AlphaFold and similar approaches which use deep learning.11,12
The availability of 3D coordinates allows us not only to predict the damaging effect of a variant, but also to understand the molecular mechanisms by which it affects protein structure/function. In 2019, we launched Missense3D,13 which predicts the effect of a variant on the folding and stability of a monomeric protein. However, Missense3D, similar to other algorithms such as HOPE14 and SAAP,15 uses the 3D coordinates of a single protein chain, thus, potentially failing to identify the detrimental effect of variants located on the protein surface that may affect PPI. Such a damaging effect can only be predicted when the 3D coordinates of a protein complex are taken into account, an approach used by algorithms such as the energy-based programs BeAtMuSiC,16 MutaBind217 and mCSM-PPI2.18 However, most of these in silico prediction tools have been trained on engineered protein variants deposited in databases, such as Skempi19 and Protherm20 and do not perform equally well when used on other datasets of variants.21,22
To date, the use of 3D structures to predict the effect of a variant remains relatively limited compared to the use of sequence conservation, thus calling for the development of new user-friendly algorithms that can be easily implemented to enhance variant prediction. We present Missense3D-PPI, a purely structure-based algorithm for the prioritization and characterization of missense variants occurring at protein–protein interfaces. Missense3D-PPI is available at the Missense3D web portal (https://missense3d.bc.ic.ac.uk).
Algorithm
Experimental structures and missense variants
Figure 1 presents the Missense3D-PPI pipeline. We extracted ~4 million human missense variants from our in-house Missense3D-DB database23 which contains the phenotypic annotation of variants from ClinVar24 and UniProt25 and minor allele frequency (MAF) data from GnomAD.26 In order to identify missense variants occurring at a PPI site, we extracted 16,609 high resolution (≤2.5 Å) X-ray crystal structures of human dimers and multimers from the Protein Data Bank (PDB).27 For each protein complex, we selected the experimental structure with the best resolution and without mutations in the protein interface.
Figure 1. Missense3D-PPI pipeline used to analyse the structural effect of amino acid substitutions.
Interface residues were defined as any residue with a relative solvent accessibility (RSA) difference ≥5% between the monomeric and the protein complex structure calculated using an in-house program. Each interface residue was categorised as core, rim or support according to the change in RSA between the monomeric (RSAmonomer) and complex (RSAcomplex) form: core residue: RSAmonomer ≥ 9% and RSAcomplex < 9%; RSAmonomer - RSAcomplex ≥ 5% rim residue: RSAmonomer ≥ 9% and RSAcomplex ≥ 9%; RSAmonomer - RSAcomplex ≥ 5% support residue: RSAmonomer < 9% and RSAcomplex < 9%; RSAmonomer - RSAcomplex ≥ 5%
We based our definition of core and rim according to our definition of change between buried and exposed status,13 but we acknowledge that several other definitions can be found in the literature. Only variants occurring at interface residues were retained and the final dataset comprised of 1,279 missense variants (pathogenic n = 733, benign n = 546) in 434 proteins and 545 PDB coordinates of PPI complexes. The benign dataset included variants with an annotation of “benign” from ClinVar or UniProt and variants with MAF > 1% and no “pathogenic” annotation in ClinVar and/or UniProt. Human Leukocytes Antigen (HLA) proteins were excluded from the final dataset because of their highly polymorphic antigen binding amino acid residues.28 Over 1,000 naturally occurring haemoglobin variants have been described and extensively studied clinically and in vitro.29 In our dataset, haemoglobin variants annotated as “unstable” were also included in the “damaging” dataset.
Missense3D-PPI pipeline and definition of damaging structural features
For each variant a mutant structure was generated using SCRWL,30 as described in Ittisoponpisan et al..13 Briefly, the mutant structure was generated by removing the side chain of the wild-type query residue and the side chain of all residues within 5 Å distance from the query residue (defined by any pair of inter-residue atoms closer than 5 Å). Subsequently, the side chain of the mutant residue was re-introduced and the structure repacked. The wild type and mutant structures were compared, and a variant was considered damaging if one of the structural features presented in Table S1, affecting two or more interchain residues was identified. The features “Interface Gly, Tyr or Trp replaced” - defined as the substitution of an interface amino acid from glycine, tyrosine or tryptophan to any residue - were introduced because of the functional importance of these residues within the protein interface31 Moreover, similarly to Missense3D, in Missense3D-PPI 1 Å was added to hydrogen bonds and salt bridges cut-off distances to account for the use of 3D models and structures with poor resolution.13
Evaluation of performance and benchmarking
The final dataset of 1,279 variants was split into a training and a testing set. To avoid an overlap of homologous proteins between the training and testing sets, homology between proteins was assessed using the HH-suite,32 as detailed in Supplementary Data (Suppl Material and Figure S1). Clusters of homologous proteins were distributed to the training and testing set to obtain a similar ratio of variants in the two sets. The characteristics of the final datasets are described in Table S2.
The performance of Missense3D-PPI was calculated as per Missense3D algorithm.13 Briefly, variants causing at least one damaging structural feature according to Missense3D-PPI were considered true positives if annotated as damaging, otherwise false positive (FP). The following were used to assess the performance of Missense3D-PPI: sensitivity, specificity, true positive rate (TPR), false positive rate (FPR), TPR/FPR ratio, accuracy and Matthews Correlation Coefficient (MCC) (see also Supplementary Data).
Missense3D-PPI was benchmarked against Missense3D,13 MutaBind2,17 BeAtMuSiC16 and mCSM-PPI2.18 The last three methods are based on energy calculation and provide a ΔΔG value as an output. We defined variants as damaging if resulting in ΔΔG ≥ 1.5 kcal/mol or ≤ -1.5 kcal/mo l, otherwise neutral.33 Although a ΔΔG ≥ 1.5 Kcal/-mol is a widely used cut-off, for completeness, a comparison of the performance of the predictors at different energy thresholds is available in Table S3 and S4. McNemar’s test34,35 was used to compare the performance of Missense3D-PPI against that of other algorithms.
Results
Missense3D-PPI was trained on a dataset of 640 variants (320 pathogenic and 320 benign; 435 occurring in a rim residue, 196 in a core and 9 in a support residue (Table S2). These variants were harboured by 310 human proteins and mapped onto 375 human protein complexes. On the training set, salt bridge and H-bond breakage/formation and charge-related features performed best on core and support residues. These features were, therefore, not used on rim residues in the testing set. (The results on the training set are presented in Figure S2).
The performance of Missense3D-PPI was assessed on a test set, which comprised of 639 interface variants (413 damaging and 226 benign) occurring in 170 protein complexes and 124 unique human proteins. The 20 structural features analysed by Missense3D-PPI are presented in Table S1. In the test set, no variants caused a breakage of disulphide bonds, a feature that on the training set had obtained a TPR/FPR ratio of 7.4. All structural features showed a TPR/FPR > 1, suggesting that Missense3D-PPI can accurately distinguish between damaging and neutral variants (Figure S3). Missense3D-PPI achieved an accuracy of 58% with a sensitivity (TPR) of 44% and a specificity of 84%. Missense3D-PPI performed better on core rather than rim residues (accuracy: 62% versus 56%, sensitivity: 63% versus 25%, specificity: 56% versus 91%, (Figure S4 and S5). Only 16 residues were annotated as ‘support’, a number too small to allow us to calculate the accuracy of the predictor on this single type of interface residues.
The performance of Missense3D-PPI was compared to that of our in-house Missense3D, which assesses the effect of missense variants using the 3D coordinates of the monomeric protein. The performance of Missense3D-PPI was superior to that of Missense3D: sensitivity improved from 8% to 44% and the overall accuracy from 40% to 58%, p = 4.23 × 10−16 (McNemar’s test). When comparing the accuracy of Missense3D-PPI to that of other structure-based predictors for variants at protein interface, Missense3D-PPI was consistently superior to BeAtMuSiC, which is based on residue-residue potential (p = 3.4 × 10−5), mCSM-PPI2 (p = 1.5 × 10−12) and MutaBind2 (p = 0.0025, Table 1).
Table 1. Missense3D-PPI performance compared to other algorithms.
| Missense3D-PPI | Missense3D | MutaBind2 | BeAtMuSiC | mCSM-PPI2 | |
|---|---|---|---|---|---|
| MCC | 0.28 | 0.12 | 0.21 | 0.20 | 0.16 |
| Accuracy | 58% | 40% | 50% | 50% | 43% |
| Sensitivity | 44% | 8% | 27% | 27% | 14% |
| Specificity | 84% | 98% | 90% | 90% | 96% |
| TPR | 0.44 | 0.08 | 0.27 | 0.27 | 0.14 |
| FPR | 0.16 | 0.02 | 0.10 | 0.10 | 0.04 |
Case study
Variant p.Ser336Arg causes medium-chain specific acyl-CoA dehydrogenase (ACADM) deficiency (OMIM Disease ID, MIM: 201450).36 This variant is harboured by the human enzyme ACADM (UniProt ID P11310). Serine 336 is located on the surface of the ACADM protein and its substitution to the large and charged arginine is predicted tolerated when assessed using the ACADM single chain 3D coordinates in Missense3D because it does not affect the correct folding and stability of the single protein. However, the same variant is predicted damaging by Missense3D-PPI when its effect is assessed on the ACADM homotetramer 3D structure (PDB:4p13, Figure S6 top panel). The substitution to arginine introduces an unbalanced charge at the protein interface. Furthermore, it is predicted to cause a steric hindrance of ACADM homotetramer assembly.
Another example of the added value of assessing the effect of genetic variants with Missense3D-PPI is p.Gly41Arg in the alanine glyoxylate aminotransferase (AGXT) human enzyme (UniProt ID P21549), which causes type I primary hyperoxaluria (MIM 259900). Missense3D-PPI predicts this variant to be damaging through the formation of a new interchain hydrogen bond, the replacement of an interface glycine with any other residue and the introduction of an unbalanced charge at the AGXT homodimeric interface (PDB:4kyo; Figure S6, bottom panel). When analyzed on the single unbound AGXT structure, this variant is predicted tolerated by Missense3D because this amino acid substitution on the protein surface does not affect the AGXT monomeric protein structure.
Web server
Missense3D-PPI is freely available at (https://missense3d.bc.ic.ac.uk/). Two Input pages are available to the user: if an experimental structure covering the residue harbouring the variant is available from PDB, the ‘Position on Protein Sequence” can be used. In this case the protein UniProt identifier and the position of the residue according to UniProt numbering should be provided. If a 3D model (including AlphaFold) or an experimental structure not yet deposited in PDB are used for the analysis of the variant, the “Position on 3D Structure” should be selected. In this case, a 3D coordinate file should be uploaded and the position of the variant specified according to the 3D structure residue numbering. Both Input pages require knowledge of the chain ID which identified the query protein in the 3D coordinates file. Example submissions entries are provided.
The initial output of Missense3D-PPI informs the user whether a structural damage has been identified or not. The in-depth structural analysis is provided in a separate Results page (Figure 2). The latter allows to visualize and manipulate the wild type and mutant residue on the 3D structure using a JSmol37 interactive window. Furthermore, the structural damage, if any, identified is described along with all other structural features analysed.
Figure 2. Missense3D-PPI Result page.
Discussion
We have developed Missense3D-PPI to specifically address the problem of predicting the effect of missense variants occurring on the surface of a protein in a PPI site. Missense3D-PPI requires the 3D coordinates of a protein complex, experimentally determined or modelled through algorithms, such as AlphaFold12 or GWIDD,38 the latter available from the GWYRE database.9 Moreover, additional precomputed 3D models of protein complexes are available from databases, such as Interactome3D8 and PrePPI.10 We showed that, for variants located at protein interfaces, Missense3D-PPI performed superiorly compared to Missense3D and to other structure-based variant predictors.
Missense3D-PPI is specifically designed to address the disruption of structural features, such as cavities, clashes or chemical bonds occurring in interchain residues. Such features are not included in our previous algorithm Missense3D, which uses the 3D coordinates of the monomeric protein. Another major difference between Missense3D-PPI and Missense3D is the introduction of interface-specific features, such as replacement of tyrosine and tryptophan, which are important residues for protein–protein interaction.31 To account for these residues being enriched in interaction hot spots, we introduced the qualitative features “Interface Trp replaced” and “Interface Tyr replaced”, when either a tryptophan or a tyrosine interface residue is replaced by any other amino acid. Arginine has also been shown to contribute to PPI.31 However, the replacement of arginine is already captured by features, such as ‘charge replaced’, ‘charge switch’ and ‘salt bridge breakage’, therefore an ‘interface arginine replaced’ feature was not introduced in Missense3D-PPI.
Two additional features, ‘interface Gly replaced’ and ‘interface Pro introduced’, were added in Missense3D-PPI to address potential structural problems which may occur at a protein interface. Indeed, glycine and proline, have unique structural features and their replacement/introduction can affect the integrity of a protein interface. Glycine is the smallest residue and its substitution with larger residues may introduce a clash, whereas introduction of a proline may limit the degree of flexibility of the backbone, thus affecting the conformation of the secondary structure of a protein interface.1,39
On our test set, “interface charge replaced”, “interface salt bridge breakage”, “interface Gly replaced”, “interface Tyr replaced” and “interface Gly in a bend” had the best TPR/FPR ratio, which is in line with the known physico-chemical properties of protein interfaces. “Breakage of a disulphide bond” was one of the top-performing features in the training test, but, unfortunately, no variant exhibited this feature in the test set. Disulphide bonds play an important role in protein structure stabilization and dimer formation.40,41 For this reason, the feature “disruption of disulphide bond” was kept in the final Missense3D-PPI algorithm.
Protein interfaces are not homogenous regions, and the properties of interface residues may vary according to their role within the interface, e.g. core or rim, and the type of protein interaction, e.g. transient or permanent. In our dataset, features, such as replacement of charged residues and disruption/formation of chemical bonds, only performed well when calculated on core residues. Indeed, core residues are more important in stabilizing the PPI compared to rim residues. Moreover, the performance of Missense3D-PPI should ideally be tested according to the nature of the protein interaction but, to our knowledge, no database of transient and permanent PPI is currently available. Homodimers have often been reported as nearly always permanent PPI, whereas hetero complexes as often transient, yet we did not identify a difference in the performance of Missense3D-PPI in homodimers versus heterocomplexes (data not shown).
Missense3D-PPI was benchmarked against three widely used energy-based programs for interface variants, BeAtMuSiC,16 MutaBind217 and mCSM-PPI218 and was significantly more accurate that all three. The added value of Missense3D-PPI compared to other programs includes a detailed description of the structural problem identified and the visualization of the mutant residue on the 3D structure. Moreover, in Missense3D-PPI the user does not need to specify the interacting chain in the 3D coordinate file, since the algorithm automatically calculates all interface residues between the protein of interest and other partners in the 3D coordinates file. This is particularly useful when the 3D coordinates file is that of a multimer.
Missense3D-PPI predictions are solely based on the disruption of structural features and do not include information derived from sequence conservation. As expected, in our dataset damaging variants were more likely to occur in highly conserved residues compared to benign variants (Figure S7). The atom-based analysis provided by Missense3D-PPI complements the information derived from sequence conservation which is available from several variant prediction algorithms. We therefore recommend using Missense3D-PPI as an additional tool for variant prediction, as recommended by the American College of Medical Genetics and Genomics (ACMG) and the UK Association for Clinical Genomic Science (ACGS) Best Practice guidelines on variant classification.42,43 Another limitation is that Missense3D-PPI requires the 3D coordinates of protein complexes. Although the coverage of the human proteome interactome is still quite limited, one can expect a substantial expansion in the 3D coordinates of protein complexes in the near future from development such as AlphaFold.12 We did not test Missense3D-PPI on models, but, because it was developed using the same strategy adopted in Missense3D, we do not expect a significant drop in performance when applied to 3D models of complexes.
In conclusion, Missense3D-PPI is a new resource to aid in the prioritization and characterization of genetic variants disrupting protein networks.
Supplementary Material
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jmb.2023.168060.
Acknowledgements
This research was funded in part by the Wellcome Trust under grant numbers 104955/Z/14/Z and 218242/Z/19/Z to MJES and AD and which supported AD and GH. CP was supported by the BBSRC grant BB/P023959/1. SAI and MJES were supported by BBSRC grant BB/T010487/1. For the purpose of open access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Footnotes
CRediT authorship contribution statement
Alessia David and Michael J. E. Sternberg: conceived the study and contributed to the interpretation of findings. Cecilia Pennica: collected the data and performed the main analyses. Gordon Hanna and Suhail A. Islam: provided technical support. Alessia David: wrote the first draft of the manuscript. All authors approved the final version of the manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data availability
The data used to develop Missense3D-PPI are available from the Missense3D-PPI website under Dataset and in Supplementary file 1.
References
- 1.David A, Razali R, Wass MN, Sternberg MJE. Protein–protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum Mutat. 2012;33:359–363. doi: 10.1002/humu.21656. [DOI] [PubMed] [Google Scholar]
- 2.Engin HB, Kreisberg JF, Carter H. Structure-Based Analysis Reveals Cancer Missense Mutations Target Protein Interaction Interfaces. PLoS One. 2016;11:e0152929. doi: 10.1371/journal.pone.0152929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Livesey BJ, Marsh JA. The properties of human disease mutations at protein interfaces. PLoS Comput Biol. 2022;18:1–26. doi: 10.1371/journal.pcbi.1009858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sahni N, Yi S, Taipale M, Fuxman Bass JI, Coulombe-Huntington J, Yang F, Peng J, Weile J, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell. 2015;161:647–660. doi: 10.1016/j.cell.2015.04.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jubb HC, Pandurangan AP, Turner MA, Ochoa-Montaño B, Blundell TL, Ascher DB. Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. Prog Biophys Mol Biol. 2017;128:3–13. doi: 10.1016/j.pbiomolbio.2016.10.002. [DOI] [PubMed] [Google Scholar]
- 6.Vakser IA. Protein-protein docking: from interaction to interactome. Biophys J. 2014;107:1785–1793. doi: 10.1016/j.bpj.2014.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, et al. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47:D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat Meth. 2013;10:47–53. doi: 10.1038/nmeth.2289. [DOI] [PubMed] [Google Scholar]
- 9.Malladi S, Powell HR, David A, Islam SA, Copeland MM, Kundrotas PJ, Sternberg MJE, Vakser IA. GWYRE: A Resource for Mapping Variants onto Experimental and Modeled Structures of Human Protein Complexes. J Mol Biol. 2022;434 doi: 10.1016/j.jmb.2022.167608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Garzón JI, Deng L, Murray D, Shapira S, Petrey D, Honig B. A computational interactome and functional annotation for the human proteome. Elife. 2016;5:e18715. doi: 10.7554/eLife.18715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022;13:1265. doi: 10.1038/s41467-022-28865-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJE. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol. 2019;431:2197–2212. doi: 10.1016/j.jmb.2019.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Venselaar H, Te Beek TAH, Kuipers RKP, Hekkelman ML, Vriend G. Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces. BMC Bioinform. 2010;11:548. doi: 10.1186/1471-2105-11-548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Al-Numair NS, Martin ACR. The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations. BMC Genom. 2013;14(Suppl 3):S4. doi: 10.1186/1471-2164-14-S3-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Dehouck Y, Kwasigroch JM, Rooman M, Gilis D. BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations. Nucleic Acids Res. 2013;41:W333–W339. doi: 10.1093/nar/gkt450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang N, Chen Y, Lu H, Zhao F, Alvarez RV, Goncearenco A, Panchenko AR, Li M. MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions. IScience. 2020;23 doi: 10.1016/j.isci.2020.100939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Rodrigues CHM, Myung Y, Pires DEV, Ascher DB. mCSM-PPI2: predicting the effects of mutations on protein-protein interactions. Nucleic Acids Res. 2019;47:W338–W344. doi: 10.1093/nar/gkz383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jankauskaitė J, Jiménez-García B, Dapkūnas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics. 2019;35:462–469. doi: 10.1093/bioinformatics/bty635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha MM. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res. 2021;49:D420–D424. doi: 10.1093/nar/gkaa1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fang J. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinform. 2020;21:1285–1292. doi: 10.1093/bib/bbz071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yang Y, Urolagin S, Niroula A, Ding X, Shen B, Vihinen M. PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality. Int J Mol Sci. 2018;19:1009. doi: 10.3390/ijms19041009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Khanna T, Hanna G, Sternberg MJE, David A. Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants. Hum Genet. 2021;140:805–812. doi: 10.1007/s00439-020-02246-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46:D1062–D1067. doi: 10.1093/nar/gkx1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2020;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Burley SK, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow GV, Christie CH, Dalenberg K, et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49:D437–D451. doi: 10.1093/nar/gkaa1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Choo SY. The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei Med J. 2007;48:11–23. doi: 10.3349/ymj.2007.48.1.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Thom CS, Dickson CF, Gell DA, Weiss MJ. Hemoglobin Variants: Biochemical Properties and Clinical Correlates. Cold Spring Harb Perspect Med. 2013;3:a011858. doi: 10.1101/cshperspect.a011858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Bower MJ, Cohen FE, Dunbrack RL. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. J Mol Biol. 1997;267:1268–1282. doi: 10.1006/jmbi.1997.0926. [DOI] [PubMed] [Google Scholar]
- 31.Chakrabarti P, Janin J. Dissecting protein–protein recognition sites. Proteins: Struct, Function, Bioinformatics. 2002;47:334–343. doi: 10.1002/PROT.10085. [DOI] [PubMed] [Google Scholar]
- 32.Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinf. 2019;20:1–15. doi: 10.1186/s12859-019-3019-7. /FIGURES/7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gerasimavicius L, Liu X, Marsh JA. Identification of pathogenic missense mutations using protein stability predictors. Sci Rep. 2020;10:15387. doi: 10.1038/s41598-020-72404-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 1998;10:1895–1923. doi: 10.1162/089976698300017197. [DOI] [PubMed] [Google Scholar]
- 35.McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12:153–157. doi: 10.1007/BF02295996. 12:2. [DOI] [PubMed] [Google Scholar]
- 36.Andresen BS, Jensen TG, Bross P, Knudsen I, Winter V, Kølvraa S, Bolund L, Ding JH, et al. Disease-causing mutations in exon 11 of the medium-chain acyl-CoA dehydrogenase gene. Am J Hum Genet. 1994;54:975–988. [PMC free article] [PubMed] [Google Scholar]
- 37.Willighagen E, Howard M. Fast and Scriptable Molecular Graphics in Web Browsers without Java3D. Nat Prec. 2007:1. doi: 10.1038/npre.2007.50.1. [DOI] [Google Scholar]
- 38.Kundrotas PJ, Zhu Z, Vakser IA. GWIDD: a comprehensive resource for genome-wide structural modeling of protein-protein interactions. Hum Genomics. 2012;6:7. doi: 10.1186/1479-7364-6-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gao M, Zhou H, Skolnick J. Insights into Disease-Associated Mutations in the Human Proteome through Protein Structural Analysis. Structure. 2015;23:1362–1369. doi: 10.1016/j.str.2015.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Li J, Yen TY, Allende ML, Joshi RK, Cai J, Pierce WM, Jaskiewicz E, Darling DS, et al. Disulfide bonds of GM2 synthase homodimers. Antiparallel orientation of the catalytic domains. J Biol Chem. 2000;275:41476–41486. doi: 10.1074/jbc.M007480200. [DOI] [PubMed] [Google Scholar]
- 41.McAuley A, Jacob J, Kolvenbach CG, Westland K, Lee HJ, Brych SR, Rehder D, Kleemann GR, et al. Contributions of a disulfide bond to the structure, stability, and dimerization of human IgG1 antibody CH3 domain. Protein Sci. 2008;17:95–106. doi: 10.1110/ps.073134408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, et al. ACMG Laboratory Quality Assurance Committee, Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–424. doi: 10.1038/gim.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ellard S, Baple EL, Callaway A, Berry I, Forrester N, Turnbull C, Owens M, Eccles DM, et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease 2020. 2021 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used to develop Missense3D-PPI are available from the Missense3D-PPI website under Dataset and in Supplementary file 1.


