Abstract
Next-generation sequencing of human genomes reveals millions of missense variants, some of which may lead to loss of protein function and ultimately disease. Here, we investigate missense variants in membrane proteins—key drivers in cell signaling and recognition. We find enrichment of pathogenic variants in the transmembrane region across 19,000 functionally classified variants in human membrane proteins. To accurately predict variant consequences, one fundamentally needs to understand the underlying molecular processes. A key mechanism underlying pathogenicity in missense variants of soluble proteins has been shown to be loss of stability. Membrane proteins, however, are widely understudied. Here, we interpret variant effects on a larger scale by performing structure-based estimations of changes in thermodynamic stability using a membrane-specific energy function and analyses of sequence conservation during evolution of 15 transmembrane proteins. We find evidence for loss of stability being the cause of pathogenicity in more than half of the pathogenic variants, indicating that this is a driving factor also in membrane-protein-associated diseases. Our findings show how computational tools aid in gaining mechanistic insights into variant consequences for membrane proteins. To enable broader analyses of disease-related and population variants, we include variant mappings for the entire human proteome.
Significance
Genome sequencing is revealing thousands of variants in each individual, some of which may increase disease risks. In soluble proteins, stability calculations have successfully been used to identify variants that are likely pathogenic due to loss of protein stability and subsequent degradation. This knowledge opens up potential treatment avenues. Membrane proteins form about 25% of the human proteome and are key to cellular function; however, calculations for disease-associated variants have not systematically been tested on them. Here, we present a new protocol for stability calculations on membrane proteins under the usage of a membrane-specific energy function and its proof-of-principle application on 15 proteins with disease-associated variants. We integrate stability calculations with analysis of sequence evolution, allowing us to separate variants where loss of stability is the most likely mechanism from those where other protein properties such as ligand binding are affected.
Introduction
Proteins carry out the majority of functions in a cell. While most proteins are robust to some sequence changes (1), other single amino acid variants may render them nonfunctional. For nuclear and cytosolic proteins, we and others have shown that the molecular reason underlying loss of function for human pathogenic variants is often loss of protein stability (2,3,4,5,6,7,8,9,10). Proteins affected by such destabilizing variants are recognized by the cellular protein quality control system, leading to degradation and hence low levels that cause a loss-of-function phenotype (11). For soluble proteins, structure-based calculations of stability changes upon mutation (G) (12) correlate with experimental stability (13,14,15,16) as well as high-throughput abundance measurements (17,18), allowing us to annotate variants accordingly (19). The loss of stability induced by such variants often leads to cellular protein degradation. This mechanistic link to degradation is not only interesting from a biophysical perspective but can also lead to development of treatments that rescue the variant from degradation (20,21).
Twenty-three percent of genes in the human proteome encode membrane proteins, including channels, transporters, enzymes, and receptors such as G-protein coupled receptors (GPCRs) (22). Located at the junction between two compartments and often exposed to small molecules in the bloodstream, membrane proteins are key in cell signaling and recognition, as well as major drug targets (23,24). Variants in membrane proteins are associated with a number of diseases, including, for example, cystic fibrosis, Parkinson’s, Alzheimer’s, and atherosclerosis (25,26,27,28).
Studying membrane proteins experimentally or computationally is challenging, as the proteins need to be considered in context of the lipid membrane (29). Furthermore, while many soluble proteins can unfold and refold reversibly, the processes of synthesis, folding, and assembly are intrinsically linked for membrane proteins (30,31). In particular, denaturants can perturb properties of the membrane (or its mimetics) when thermodynamic stability measurements are performed in (near) native conditions. More recent techniques such as steric trapping or label-free differential scanning fluorimetry aim to avoid those drawbacks but cannot be applied in a high-throughput manner (32,33). Therefore, large-scale and easily accessible experimental data for benchmarking computational tools are sparse. Despite recent methodological advances (34), computational methods for membrane proteins are not as developed as those for soluble proteins. Furthermore, the diverse experimental studies measure different levels of unfolding, which further challenges computational method development. Thus, the application of computational analyses for examining a potential correlation between protein stability, cellular abundance, and function analogous to that for soluble proteins may be particularly challenging for membrane proteins.
Building on recent energy function developments that make computational analysis of membrane proteins more realistic (35), we here set out to assess whether calculations of the change in folding free energy can be used to identify the subset of pathogenic variants that are likely caused by loss of stability. In particular, we calculate the change in folding energy between a wild-type protein and a protein variant , where low correspond to substitutions that—in light of stability—appear well tolerated, and high for variants that destabilize the protein structure. Of note, the levels of unfolding or destabilization in vivo do not necessarily have to lead to complete protein unfolding. Partial unfolding may be sufficient to trigger recognition by the protein quality control system. We first combined several protein annotation databases to obtain an overview of the number and types of missense variants that are found in membrane proteins. We then analyzed in more detail 15 human membrane proteins for which high-resolution structural data as well as annotations of pathogenic and benign variants were available, and calculated G values for them. In addition, we used an evolutionary sequence analysis approach (36) to calculate a value, which we term E, indicating the evolutionary importance of each residue. This and similar approaches have been shown to be useful in detecting detrimental variants and include both loss-of-stability variants and variants that lose function due to—for example—catalytic impairment of enzymes or mistrafficking (37,38,39). Multiple recent works have demonstrated that sequence analysis of conservation is able to capture such detrimental variants with high accuracy (37,40,41). The mechanistic reasons for why a variant is not tolerated by evolution, whether it be gain or loss of function or other aspects, such as loss of stability, are not directly apparent, as is the case for many predictors. In the following, we associate high E with loss of function to facilitate reading. In this work, we use it in combination with loss of stability for dissection of underlying causes. The combination of G and E has proven particularly useful for providing mechanistic insight into loss-of-function variants in soluble proteins (18,19). Here, we apply such a combined analysis to gain mechanistic insight into variant consequences in 15 selected membrane proteins.
Methods and materials
Collection and processing of clinical, population, and structural data
To extract all annotated human membrane proteins, we first obtained all unique proteins (UniProt-ID) of the human proteome (organism = homo sapiens) from the UniProt (https://www.uniprot.org/help/api) (42) and EMBL-EBI (https://www.ebi.ac.uk/proteins/api/doc/) databases. For each UniProt-ID, we then stored its general and amino acid-based annotations (such as protein domain regions) in UniProt and further selected proteins of the type “TRANSMEM,” “INTRAMEM,” “TOPODOM,” or “LIPID.” This annotation originates from assignment of structural properties or predictions by TMHMM (https://www.uniprot.org/help/topo_dom). The UniProt-ID of the first transcript is used in the further mapping and analysis.
We then further filtered the UniProt-ID list so that all remaining proteins have at least one ClinVar (43) or gnomAD (44) missense variant. gnomAD data were taken from an in-house database built on exome data from gnomAD v2 and whole-genome data from gnomAD v3 (scripts available at https://github.com/KULL-Centre/PRISM/tree/main/software/make_prism_files, release-tag v0.1.1). The database was generated by first downloading the vcf files (May 2021), selecting exome GRCh38 liftover for v2 and whole-genome files for v3. The vcf files were then annotated with variant effect predictor with the GRCh38 release 100 human transcripts set from Ensembl. From the annotated vcfs we established for all protein-level variants, separately in exome and genome data, allele frequencies from the variant allele count as the sum of all DNA variants leading to the same protein-level variant. Clinvar data were obtained by parsing the following file: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz (May 2021) and only admitting entries that have a rating of at least one star, are single-nucleotide variants and mapped to GRCh38 (script available at https://github.com/KULL-Centre/PRISM/tree/main/software/make_prism_files, release-tag v0.1.1). The data set for the entire human proteome is provided at https://sid.erda.dk/sharelink/c3rDfqR8nn, using a UniProt-AC-based directory structure, e.g., files for GTR1 (UniProt: P11166) can be found in subdirectory prism/P1/11/66/.
Next, we extracted all PDB-IDs from RCSB and PDBe with a matching UniProt-ID reference. As not every PDB-ID for a given UniProt-ID in PDBe could be found in RCSB PDB and vice versa, we searched both databases. We further included phenotype and genomic disease annotations from OMIM via mim2gene (https://omim.org/static/omim/data/mim2gene.txt) and MIM, including the proteins’ chromosome information.
The sequences were then aligned to the UniProt sequence using pairwise2.align.globalds (with BLAST defaults) from Biopython (45), a minimal identity of 0.6, and minimal coverage of 0.1 for alignment acceptance. All residues that do not match the UniProt sequence were discarded. The final data contained each protein sequence, its UniProt-ID, the secondary structure prediction by residue, solvent-accessible surface area for each wild-type residue, and UniProt annotations such as transmembrane region, protein modifications, total allele frequency counts from gnomAD, ClinVar significance statements, genomic disease annotations, and associated PDB-IDs.
Selection of targets used for computational predictions
To find a set of proteins for our computational sequence and structure analyses, we selected all proteins that have at least one benign and one pathogenic ClinVar annotation in an experimentally resolved transmembrane region of the protein. This reduced the number of proteins with gnomAD or ClinVar annotations from 1504 proteins to 41 proteins. As the Rosetta membrane energy function has been developed and benchmarked on structures resolved by x-ray crystallography, we selected those, reducing the protein set to 16. The selected proteins are listed in Table 1, additional supplemental Table 1 in the supporting material and supporting data at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al.
Table 1.
Protein information | Length | Before filtering | After filtering | GEMME | AUC | MIM | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Class | All | TM | Group A | Group B | Benign | Group A | Group B | Benign | MSA depth | Nf (neff/) | G | E | Phenotype |
NPC1 | transporter | 1278 | 277 | 60 | 36 | 13 | 44 | 12 | 5 | 1486 | 31.43 | 0.69 | 0.84 | Niemann-Pick disease |
OPSD | GPCR | 348 | 161 | 67 | 10 | 2 | 41 | 6 | 2 | 1183 | 49.16 | 0.79 | 0.69 | night blindness; retinitis punctata albescens; retinitis pigmentosa |
GTR1 | transporter | 492 | 261 | 56 | 9 | 4 | 42 | 7 | 2 | 1772 | 65.62 | 0.65 | 0.92 | dystonia; GLUT1 deficiency syndrome; epilepsy |
AT2A2 | transporter | 1042 | 204 | 12 | 9 | 7 | 8 | 5 | 4 | 1924 | 44.33 | 0.85 | 1 | acrokeratosis verruciformis; Darier disease |
ACHB2 | ion channel | 502 | 86 | 3 | 16 | 12 | 1 | 5 | 3 | 1789 | 57.23 | 0.8 | 1 | epilepsy |
CXB2 | cell junction | 226 | 83 | 60 | 23 | 8 | 33 | 16 | 5 | 1351 | 82.1 | 0.42 | 0.84 | deafness; Bart-Pumphrey syndrome; Vohwinkel syndrome; etc. |
S5A2 | enzyme | 254 | 84 | 21 | 11 | 5 | 18 | 8 | 5 | 1177 | 60.18 | 0.63 | 0.9 | pseudovaginal perineoscrotal hypospadias |
MC4R | GPCR | 332 | 165 | 12 | 11 | 2 | 10 | 8 | 1 | 1265 | 54.02 | 0.66 | 0.9 | obesity |
AQP2 | ion channel | 271 | 124 | 10 | 2 | 1 | 7 | 1 | 1 | 1326 | 62.36 | 1 | 0.86 | diabetes insipidus |
ACHA4 | ion channel | 627 | 85 | 3 | 47 | 32 | 3 | 6 | 4 | 1107 | 27.84 | 0.33 | 0.94 | epilepsy; nicotine addiction |
JAGN1 | transporter | 183 | 84 | 5 | 4 | 3 | 4 | 3 | 3 | 196 | 13.45 | 0.42 | 0.92 | neutropenia |
SMO | GPCR | 787 | 147 | 1 | 25 | 6 | 1 | 7 | 1 | 823 | 17.7 | 1 | 0.86 | Curry-Jones syndrome; Pallister-Hall-like syndrome; basal cell carcinoma |
ABCG8 | transporter | 673 | 126 | 1 | 35 | 11 | 0 | 15 | 3 | 86 | 2.14 | – | – | gallbladder disease; sitosterolemia |
ABCG5 | transporter | 651 | 127 | 1 | 27 | 9 | 0 | 0 | 0 | 31 | 0.81 | – | – | sitosterolemia |
GPT | enzyme | 408 | 228 | 7 | 5 | 1 | 6 | 2 | 1 | 1812 | 64.45 | 0.5 | 0.92 | congenital disorder of glycosylation; myasthenic syndrome |
FZD4 | GPCR | 537 | 206 | 5 | 13 | 6 | 2 | 3 | 2 | 1055 | 37.22 | 0.83 | 1 | exudative vitreoretinopathy; retinopathy of prematurity |
Total | 324 | 283 | 122 | 220 | 104 | 42 | 0.64 | 0.82 |
MSA, multiple sequence alignment; TM, transmembrane.
The structures for each of the chosen proteins have been selected according to their structure selection score (StrucSescore) and the number of variants in total and within the transmembrane region. The StrucSelscore is a combination of method resolution, sequence coverage and identity to the experiment and the wild-type (according to UniProt), including an annotation about inserts, deletions, mismatch, nonobserves, and modified residues. The script is available at https://github.com/KULL-Centre/PRISM/blob/main/software/scripts/struc_select_sifts.py and the table with the numbers for each of the proteins at https://github.com/KULL-Centre/papers/blob/main/2022/hMP-Xray-Tiemann-et-al/data (∗date∗-counthMPannosplitPDBXraypublish.xlsx).
Conservation analysis of variant effects
To calculate the effect of a variant in light of evolution, we used the global epistatic model for predicting a mutational effects (GEMME) algorithm (36) as described previously (19): we first construct a multiple sequence alignment (MSA) using the sequence of the first transcript of each proteins UniProt-ID as input to HHBlits (v2.0.15) (46) with the following settings -e 1e-10 -i 1 -p 40 -b 1 -B 20000 to search UniRef30hhsuite.tar.gz (47,48,49). The MSA is filtered by keeping only positions present in the target sequence and sequences with less than 50% gaps. We then further follow the GEMME algorithm that predicts the degree of conservation for all 19 substitutions (E). We rank-normalized the E values for the entire protein to allow comparison with the other proteins in the data set. corresponds to well-tolerated substitutions, whereas corresponds to rare or absent variants. In addition, we extract the sequence coverage of the MSA for each position. To understand the effects of those filtering processes, we calculated the GEMME score against the neff coverage per protein for different filtering steps (see Fig. S1 in the supporting material).
Thermodynamic stability predictions
To calculate changes in thermodynamic stability (G), we use Rosetta version v2021.31-dev61729-0-gc7009b3115c (GitHub sha1 c7009b3115c22daa9efe2805d9d1ebba08426a54). We implemented an in-house pipeline to perform preparation, relaxation, and G calculations of the protein (https://github.com/KULL-Centre/PRISM/tree/main/software/rosetta_ddG_pipeline, release-tag v0.1.1). Preparation includes cleaning of the PDB structure coordinates (hereafter referred to as structure) of ligands and alternative rotamers and chains, superposing of the protein into the membrane plane as well as calculation of the membrane plane, lipid-accessible residues (50), and the solvent-accessible surface area using DSSP (51,52) (the latter is solely used for analysis purposes).
To utilize the membrane protein mode in Rosetta, two conditions must be met: first, a membrane plane file, containing the residues that are within the membrane, needs to be provided; second, the structure of the protein must be centered and oriented within the membrane, where the membrane thickness follows the z axis. The membrane plane can be calculated using a membrane-aligned protein structure. Therefore, protein coordinate translation was performed by structural superposition of the protein to its equivalent structure obtained from the Orientations of Proteins in Membranes database (53), which lies already within those coordinates. (If the chosen PDBid is not present in the Orientations of Proteins in Membranes database, an alternative structure for the same protein or a close homolog is chosen.) Next, the membrane plane was calculated using Rosetta (54,55) and the protein structure was relaxed as described in (56). Finally, G values for each variant were calculated as the energy of the variant minus the energy of the wild-type.
We performed a benchmark to identify the best protocol to calculate Gs for membrane proteins. First, we collected 20 experimentally derived G data sets in a total of 8 different membrane proteins (Table S1 in the supporting material). Then, we implemented three different protocols, namely MPrepack, MPflexrelaxddG, and “cartprot”, inspired by work on soluble proteins (12) and previously published work on membrane proteins (35). MPrepack operates in torsion space and performs a simple repacking of the side chains within a defined radius after mutagenesis (following the protocol mentioned in (35)). MPflexrelaxddG is inspired by (12) and allows more flexibility to accommodate the variant by allowing backbone relaxation of the variant and its sequential neighbors, in addition to repacking of side chains within a defined radius. cartprot follows the same protocol as MPflexrelaxddG but is executed in cartesian space. For all protocols, we used the membrane protein score function “franklin2019” (35) that performs comparably with older membrane scoring functions as recently evaluated in (56). Finally, we selected cartprot as the computed values gave the best correlation with the experimental data (0.46) and, additionally, the computed values have a high reproducibility, indicated by the low standard deviation for replicates (Fig. S2 in the supporting material). As mentioned in the “limitations of the study” section, the correlation of independent experimental studies on the same protein and the same variants (57,58) is 0.65.
Enrichment of benign variant counts by gnomAD allele frequency
To evaluate the value of our computational methods to predict variants to be benign or pathogenic using receiver-operator characteristic (ROC) analysis, we aimed for a large number of benign and pathogenic variants. In our target proteins, we have 324 pathogenic but only 122 benign variants. We aimed to supplement benign variants with variants from gnomAD. Therefore, we performed an ROC analysis of the gnomAD allele frequency on the 10,260 benign and 2360 pathogenic ClinVar variants in the human membrane proteome that also have a gnomAD allele frequency (Fig. 1 C) and obtained an area under the curve (AUC) of 0.96 (Fig. S3 in the supporting material). This analysis enables calculating a cutoff to separate benign from pathogenic variants using gnomAD allele frequency via the highest Youden index 59. We thereby obtain a cutoff of (Fig. 1 B). We define group B variants as the union of those that are defined by ClinVar as benign and those variants that have an allele frequency and are not pathogenic (in ClinVar). Consequently, we call pathogenic variants group A. This results for our target proteins in 324 group A and 283 group B (benign and/or nonrare) variants across 16 proteins.
Filtering criteria for variant analysis of the 16 target proteins
Prior to analysis, we defined filtering criteria for the calculated G and E variants to obtain a set of variants with reliable scores. First, only variants for which both G and E calculations are available were selected. Second, special residues involved in disulfide bonds or known modified residues (such as those that bind covalent ligands or palmitoylated residues) were excluded. Further, variants with a low MSA sequence coverage of fewer than 50 sequences were excluded. Finally, variants that have a positive wild-type Rosetta energy are excluded from further analyses as those residue conformers are likely to favor any substitution to reduce its energy, likely due to limitations of the Rosetta energy function. By applying all filters, we obtain a final set of 15 proteins with 220 pathogenic (group A) and 104 benign and/or nonrare (group B) variants, of which 42 are benign (see Table 1 and for the single filtering steps additional data at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al).
Definition of G and E thresholds and quadrant classification
To analyze the variants in terms of their E and G scores, we defined cutoffs for each method based on the optimal ROC value (tradeoff of high specificity versus high sensitivity) to separate group A (pathogenic) and group B (benign and/or nonrare) variants, in a similar fashion as done for gnomAD allele frequency above. Next, we defined four categories dependent on their position along the E and G axes:
-
1
quadrant (I) variants have high E and G and are likely to cause loss of function via loss of stability;
-
2
quadrant (II) variants have high E and low G and cause loss of function for other reasons than loss of stability;
-
3
quadrant (III) variants have low E and low G and are from a structural and evolutionary perspective expected to be tolerated;
-
4
quadrant (IV) variants have low E and high G and are from an evolutionary perspective expected to be tolerated but from a structural perspective expected to cause instability.
Definition of protein regions
For analysis purposes, we assigned residues into different regions based on their solvent accessibility and their positioning within the membrane (TM-region). Relative solvent accessibility was calculated using DSSP (51,52) with a cutoff of 0.3, placing residues with a smaller value into the category of buried residues. The positioning within the membrane was obtained as described above. We can thereby divide the protein into four regions:
Buried: residues with a DSSP < 0.3 and that are placed outside the membrane; this cutoff places residues within contacts as buried although they might be close to the surface of the protein
Solvent-accessible: residues with a DSSP 0.3 and that are placed outside of the membrane
TM-regionburied: buried residues that are placed within the membrane
TM-regionsolvent-accessible: solvent-accessible residues within the membrane
For additional analysis, we divide residues by whether they are oriented toward the lipids or not. To assess whether a residue faces toward the lipids, we used a dedicated Rosetta function that returns a true or false value (50). Most of those residues are within the transmembrane region but there can be exceptions that nontransmembrane residues (either solvent accessible or buried) can face the lipids by “dipping” into the membrane plane. For Fig. S4, C and D in the supporting material, we expended the regions above by three more, which are subsections of the above (with overlaps, see Fig. S4 in the supporting material):
TM-regionlipid-facingburied: buried residues within the membrane that are oriented toward the lipids
TM-regionlipid-facingsolvent-accessible: solvent-accessible residues within the membrane that are oriented toward the lipids
Others: combination of residues that are rare and few in number, such as TM-regionsolvent-accessible, lipid-facingsolvent-accessible, or lipid-facingburied
Utilized software
python3, including following third-party libraries: adjustText, Biopython, circlify, matplotlib, numpy, pandas, seaborn, scipy, sklearn, squarify, xmltodict, nglview
- Rosetta version 2021.31 + HEAD.c7009b3115c (c7009b3115c22daa9efe2805d9d1ebba08426a54, default.linuxgccrelease mode)
- Scripts available at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al
-
–hMP statistics (hMP_stats.ipynb)
-
–G pipeline benchmark (MP_ddG_benchmark.ipynb)
-
–Xray subset calculations (Xray_subset-calc.ipynb)
-
–Xray subset analysis (Xray_subset-ana.ipynb)
-
–
- Pipelines
-
–PRISM_tools (https://github.com/KULL-Centre/PRISM/software/rosetta_ddG_pipeline, release version v0.1.1)
-
–PrismData, FillVariants, and struc_select_sifts (https://github.com/KULL-Centre/PRISM/software/scripts, release version v0.2.2)
-
–
- Others
-
–overleaf.com
-
–Affinity Designer
-
–
Results and discussion
Variant annotations in human membrane proteins
We first set out to obtain an overview of the presence and properties of missense variants in human membrane proteins. We searched UniProt (42) for keywords such as TRANSMEM (see methods and materials for details) and used the results to define a list of 5522 proteins that are thought to be embedded in the membrane (Fig. 1 A). We subsequently searched the gnomAD (44) and ClinVar (43) databases for missense variants in the genes encoding these proteins (see methods and materials for additional details). gnomAD is a database aggregating the variants observed in ∼150,000 exome and genome sequences, and thus provides a relatively unbiased view of the variants that are present in the human population (44). ClinVar is a database containing, among other things, missense variants that have been categorized as benign, pathogenic, or variants of uncertain significance, the latter indicating that the pathophysiological consequences of the variant are not clear (43). We obtain almost 1.9 million variants in total for human membrane proteins, which makes up 29% of all human protein variant annotations (see Table 2). Almost all (98.1%) membrane proteins have at least one variant in gnomAD, and about half (44.0%) have at least one variant in ClinVar (Fig. 1 A). Across the two data sets, we find unique variants in 5471 membrane proteins. We excluded synonymous and indel variants, which make up 0.3% (5403 variants) from any further analysis. Nearly all (99.1%) of the nonsynonymous variants are either from gnomAD or are assigned as variants of uncertain significance in ClinVar, and only 19,089 of the 1.9 million variants have an assigned status of being pathogenic or benign (Fig. 1 A and Table 2), highlighting the scope of the problem of determining variant effects. Thirty-eight percent of all human pathogenic variants are found within membrane proteins (see Table 2), underlining the importance of method development suited for this protein class.
Table 2.
In human proteins | In human membrane proteins | |||||
---|---|---|---|---|---|---|
All | All | Extracellular | Cytoplasmic | Transmembrane | Other | |
Total | 6,526,797 | 1,867,856 | 574,211 | 447,435 | 258,366 | 587,844 |
Benign | 36,770 | 11,063 | 3489 | 3544 | 961 | 3069 |
Benign gnomAD | 33,944 | 10,260 | 3214 | 3326 | 908 | 2812 |
Pathogenic | 21,107 | 8026 | 2327 | 2233 | 1863 | 1603 |
Pathogenic gnomAD | 6200 | 2360 | 632 | 628 | 454 | 646 |
VUS | 217,726 | 64,584 | 17,451 | 25,119 | 6814 | 15,200 |
VUS gnomAD | 116,325 | 36,700 | 9907 | 14,693 | 3574 | 8526 |
Only gnomAD | 6,251,194 | 1,784,183 | 550,944 | 416,539 | 248,728 | 567,972 |
For membrane proteins, the total counts are further divided into each of the cellular regions they occur in. VUS (ClinVar) includes conflict variants. VUS, variants of uncertain significance.
Variants that are pathogenic are expected to be depleted in the human population compared with those that are benign, and indeed we find a clear separation of the distributions of allele frequencies between the two classes (Fig. 1 C). We also observe that, while 92.7% of benign ClinVar variants have been observed in gnomAD, this is only true for 29.4% of pathogenic variants (Fig. 1 B and Table 2). The separation in the distribution of allele frequencies between pathogenic and benign variants suggests that variants with allele frequencies are more likely to be benign than pathogenic (Fig. 1 C, cutoff calculated from the ROC analysis, see methods and materials). While the allele frequency in gnomAD appears to be a good predictor of pathogenicity (AUC = 0.96; Fig. S3 in the supporting material, with similar results for all human proteins; AUC = 0.95), we note that this result should be taken with some caution. First, many ClinVar variants are not found in gnomAD (Fig. 1 B), limiting the practical utility. Second, since the presence in gnomAD might have been used to assign (lack of) pathogenicity, it is difficult to ensure that the two sets of data are independent.
We analyzed in which regions of the membrane protein structures the ClinVar (Fig. 1, D and E) and gnomAD (Table 2) variants are located. We find that most variants are found in soluble domains, although this is likely due to the fact that these regions make up 83% of membrane proteins (Fig. 1, D and E). Notably, however, we find that, while the numbers of known benign and pathogenic variants are similar in the different types of soluble regions, there appears to be an almost twofold excess of pathogenic variants compared with benign variants in the transmembrane regions (Fig. 1 E and Table 2). While we cannot exclude that this enrichment is in part due to an increased focus on the transmembrane region in clinical research, we suggest—in line with previous work (63,64)—that this observation also reflects a decreased mutational tolerance of the transmembrane region.
Membrane proteins are typically defined by their interaction and/or location within the membrane. As not all of them are located to a similar degree inside the membrane, we divided the complete data set of 5796 membrane proteins into their categories as being single-pass, multi-pass, lipid-anchored, or integral membrane proteins (see Table 3). We find that most proteins are single- (40.2%) or multi-pass membrane proteins (47.7%) and also most of the variants are found in these categories (46.1 and 44.1% of 1,867,856 variants). Looking into the transmembrane region, we see the previously described enrichment of pathogenic variants especially for multi-pass membrane proteins. This makes the multi-pass membrane protein category especially interesting for further studying of the role of residues within this region.
Table 3.
Membrane protein category |
All |
Single-pass |
Multi-pass |
Lipid-anchored |
Integral |
|||||
---|---|---|---|---|---|---|---|---|---|---|
Variant counts for | All | Transmembrane | All | Transmembrane | All | Transmembrane | All | Transmembrane | All | Transmembrane |
Total | 1,867,856 | 258,379 | 861,305 | 30,480 | 824,532 | 218,193 | 127,370 | 0 | 54,649 | 9706 |
Benign | 11,063 | 961 | 4960 | 138 | 4794 | 744 | 817 | 0 | 492 | 79 |
Benign gnomAD | 10,260 | 908 | 4634 | 132 | 4455 | 699 | 690 | 0 | 481 | 77 |
Pathogenic | 8026 | 1863 | 2328 | 77 | 4741 | 1719 | 544 | 0 | 413 | 67 |
Pathogenic gnomAD | 2360 | 454 | 694 | 20 | 1377 | 412 | 144 | 0 | 145 | 22 |
VUS (ClinVar) | 64,584 | 6814 | 25,707 | 693 | 29,677 | 5784 | 5311 | 0 | 3889 | 337 |
VUS gnomAD | 41,511 | 4038 | 17,128 | 459 | 19,307 | 3371 | 2589 | 0 | 2487 | 208 |
Only gnomAD | 1,784,183 | 248,741 | 828,310 | 29,572 | 785,320 | 209,946 | 120,698 | 0 | 49,855 | 9223 |
Protein count | 5796 | 2330 | 2762 | 561 | 143 |
To gain a better understanding of the mechanisms causing benign or pathogenic variant consequences, we mapped the variants onto known protein structures. Despite recent advances in protein structure prediction (65) and analysis using computational methods (66,67), we decided to focus our work on experimentally determined structures. Specifically, we searched the protein databank (68) for structures with at least one variant in the resolved part of the protein structure and found that 27.5% of all annotated human membrane proteins have at least some part resolved and that 15.1% of the total set of variants are found in the region covered by these structures (Fig. 1 A, additional data at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al). Of the 281,220 variants found in resolved regions, only 2.2% of those (6119 variants) have been assigned as benign (2089 variants, 18.9% of total benign variants in membrane proteins) or pathogenic (4030 variants, 50.2% of total pathogenic variants in membrane proteins) (Table S2 in the supporting material).
Computational assessment of stability and evolution shows loss of function due to loss of stability for ∼62% of disease variants in selected proteins
To examine the importance of changes in protein stability in membrane proteins for causing loss of function and disease, we analyzed a smaller set of proteins in more detail. Specifically, we searched for proteins that had at least one pathogenic and one benign variant in the transmembrane region. As our aim was to use the Rosetta software to predict changes in thermodynamic stability, we focused on protein structures that had been determined via x-ray crystallography as Rosetta has been developed and benchmarked most extensively on such structures. These requirements narrow down the set to 16 proteins: 6 transporters, 3 ion channels, 4 GPCRs, 2 enzymes, and 1 cell junction protein (Table 1 and additional data at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al). These 16 proteins represent different types of membrane proteins with diverse functions, structures, and involvement in different diseases. Of note, these proteins belong to the class of multi-pass transmembrane proteins and the secondary structure of these proteins is mostly α-helical (>50%), while unstructured regions or extended strands add up to 27% (Fig. S5 in the supporting material).
Inspired by previous analyses of soluble proteins, we investigated these membrane proteins in terms of structural stability and sequence conservation. Specifically, we developed and benchmarked a revised Rosetta protocol for stability calculations of membrane proteins (see methods and materials and supporting material). We used this method to calculate the change in thermodynamic stability upon single amino acid substitutions. In each case, we selected a high-resolution structure (Table 1), and removed any cocrystallized molecules. We also constructed MSAs of each protein and used GEMME (36) to estimate the evolutionary effects of the variants. Specifically, we calculated a normalized score (E) with corresponding to substitutions that—in light of evolution—appear well tolerated, and for variants that—based on the evolutionary record—are rare or absent, and expected to cause loss of function. In analyses of soluble proteins, we have previously found that a high value of E is a good predictor for a variant to cause loss of function and that variants with both high E and G are likely to cause loss of function via loss stability and cellular abundance (18).
We calculated G and E for all variants that have been observed in humans and where the wild-type residue was resolved using x-ray crystallography. We did not analyze variants at positions where the Rosetta energy function suggested a potential incompatibility between the experimental structure and the Rosetta energy function (e.g., disulfide bridges (filter II) or residues with a positive energy where mutations are likely more tolerated by default (=, filter IV)), and variants at positions with sequences in the MSA (see methods and materials, Table 1, and supporting data at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al, table 202211-counthMPannononsyndelPDBpublish, tab X-raysetapp for the variant loss at each of the sequential filtering steps and further information). After this quality control, we retain 220/324 pathogenic and 42/122 benign variants and lose one protein (ABCG8) as it does not have any variants left after filtering. Thus, we analyzed two sets of variants: group A is the set of 220 variants that are described as pathogenic in ClinVar and group B is the set of 104 variants that are either assigned as benign in ClinVar and/or nonrare gnomAD variants that, as discussed above, are more likely to be benign than pathogenic as their allele frequencies in gnomAD are (Figs. 1 and S3 in the supporting material). In what follows we refer to group B as benign, but note that among the 104 variants in group B only 42 are classified as benign in ClinVar and the remainder comes from gnomAD. To get an indication of the influence of this filtering process, we performed all AUC measurements also on the respective filtering steps/subsets (see additional supplemental Table 2 worksheet tab X-ray_set_app_AUC in the supporting material).
To quantify how well E and G distinguish between the two classes of variants, group A (pathogenic) and group B (benign and/or nonrare), we constructed a ROC curve and calculated the AUC as a measure of how well each of the two scores can predict pathogenicity (Fig. 2 A). Of note, to reduce possible bias by the limited data set, we performed leave-one-protein-out (LOPO) calculations when performing the ROC curves and their derived cutoffs, giving us mean values with standard deviation for the AUC and mean, min, and max cutoffs values. In the following, we report the AUC as . Variant counts in the quadrant (and their respective percentages) are determined from the total cutoffs and the leave-one-protein-out calculations. In the latter, variants that are located inside the min to max leave-one-protein-out cutoff values are considered as “gray” and contribute to the standard deviation of the reported percentages. Looking at out complete data, we find that both E and G can separate the group A and group B variants, although E, as expected, performs better than G (AUC 0.82 vs. 0.64; E , G ). This is in line with previous observations for soluble proteins (6,7,9,11) and the hypothesis that many, but not all, pathogenic variants are destabilized so that G calculations can capture these pathogenic variants, but not those caused by other mechanisms for loss/gain of function.
We further analyzed the group A (pathogenic) and group B (benign and/or nonrare) variants in terms of their E and G scores (Fig. 2 B). To simplify the discussion, we analyze the variants in terms of whether E and G are low or high, with respect to cutoffs from the ROC analysis (see methods and materials). This analysis separates the variants into four quadrants with only a few variants (14.2%, ) falling in the quadrant of low E and high G (Fig. 2 B (IV)), which is comparable with previous observations for soluble proteins (18,19). The three remaining quadrants correspond roughly to:
-
(I)
variants that cause loss of function via loss of stability (high E and high G),
-
(II)
variants that cause loss of function for other reasons than loss of stability, such as substitutions at key functional sites (high E and low G),
-
(III)
variants that are expected to be tolerated both from a structural and evolutionary sequence perspective (low E and low G).
As expected, we find that most group B (benign and/or nonrare) variants have low E (75%, ) and 73% of those also low G, whereas most pathogenic variants (group A) have large E (76.8%, , see Fig. 2 B). Among the 167 (2 in gray area) pathogenic variants that have high E values, we find that 62.1% also have high G values, suggesting that loss of stability plays an important role for disease in the 15 investigated membrane proteins.
We observe a number of pathogenic variants with very negative G, which indicates a stabilizing effect on the structure. As recently shown (69), gain of function variants can lead to pathogenicity, and those variants we observe might be explained in a similar way. To confirm, more information and benchmark is needed.
Pathogenic variants in GPCRs, especially in the transmembrane region, lose function mostly by loss of stability, while this is less prominent in transporters or other protein classes
Our data set contains several members of the main membrane protein classes, namely five transporters (98 group A/pathogenic + 62 group B [benign and/or nonrare variants]), three ion channels (11 + 15 variants), four GPCRs (54 + 28 variants), and two enzymes (24 + 13 variants) (Table 1). We examined the results from our computational predictors to probe for class-specific trends. In all four classes, evolutionary conservation predictions (E) have a high AUC (>0.8), similar to the analysis with all proteins combined (Table S3 in the supporting material). Focusing on the transmembrane region, we find a very high AUC of 0.97 for variant classification of transporters (underlying datapoints: 44 pathogenic group A + 10 group B variants). Interestingly, we find that, for GPCRs, loss of stability is the main cause of pathogenicity, as indicated by an increased AUC (0.79, ) for G predictions (Fig. 3 A and Table S3 in the supporting material) compared with the complete data set with 15 proteins (AUC = 0.64, , Fig. 3). This is even more prominent for variants located within the transmembrane region (AUC = 0.81, ).
In the transmembrane region of GPCRs, 77.4% of the pathogenic variants have high G values (Fig. 3 B), suggesting that their pathogenicity is due to loss of stability. When separating the proteins into specific regions, namely by whether they are buried, solvent accessible, and are within and outside the transmembrane regions (Fig. 3 C), we see that those pathogenic variants that lose function via loss of stability are typically buried (Fig. 3 C). In contrast, solvent-accessible pathogenic variants are not found to lose function due to loss of stability, and variants located in those regions are more likely to be tolerated (11.1% of group A/pathogenic compared with 44.4% of group B variants). Within the transmembrane region, most variants (90% group A and 97% for group B/benign and/or nonrare) are buried, in contact with other residues. Looking at pathogenic variants that lose function due to other reasons than loss of stability (quadrant (II)), variants in GPCRs are more often within the transmembrane (Fig. 3 C) compared with all data sets (Fig. 3 D), where we see a larger proportion of variants at buried sites in extracellular or intracellular environments. When we further divide residues into whether they face the lipid bilayer or not, we see that most of those pathogenic variants within the transmembrane region face the lipids while being in contact with other residues, as indicated by their buriedness (Fig. S4 in the supporting material) in contrast to their likely benign counterpart that is seen to be more solvent accessible.
Next, we focused on individual proteins and examined the location and potential mechanism behind disease variants in one GPCR and one transporter protein. We used the calculated values of E and G to aid in a structural analysis of the disease variants in rhodopsin (OPSD; Fig. 4, A–C) and a glucose transporter (GTR1; Fig. 4, D–F). We examined the structures of the two proteins to find the residues that interact with ligands or cofactors and searched the literature to find residues that are known to be key to function. We find that many disease variants are located at these residues, suggesting that they directly disrupt function, and some of them also decrease stability. For example, in OPSD we find a number of disease mutants at residues that interact with the retinal cofactor as well as residues in, e.g., the so-called ionic lock (70) (Fig. 4, A and B). Similarly, many disease variants in GTR1 are located at sites known to interact with a chloride ion that is important for function (71), the sugar molecule, known inhibitors (72), or residues known to affect transport (71) (Fig. 4, C and D).
Looking across the two proteins (Fig. 4), most of the high-E, low-G disease variants are found at residues that have known functional roles. We expect such variants in quadrant (II) to lose function due to other reasons than stability (18). This also includes variation in residues in close proximity to ligands and interaction partners, which were not included in our stability calculations. Further, we find that many of the disease variants that are not located at known functional sites have both high values of E and G, suggesting that these variants instead disrupt the stability of the folded state.
Correlating physicochemical changes with variant effects
We examined the data set containing all 15 proteins and the amino acid properties within the four quadrants, where quadrant (I) contains destabilized and quadrant (II) stable variants, while both quadrants (I) and (II) are—in light of evolution—not tolerated. Quadrants (III) and (IV) are evolutionarily tolerated, but quadrant (III) contains stable and quadrant (IV) destabilized variants (see Fig. 5, A and B, and methods and materials for a more detailed quadrant definition). Across all quadrants, hydrophobic amino acids are most commonly observed (wild-type, 33%; target, 37%), which can be explained by the general preference for hydrophobic residues in membrane proteins, especially within the TM region (23). Almost 65% of the group B (benign and/or nonrare) variants located in quadrant (III) have, as expected, the same amino acid property for wild-type and target (35.1% remain hydrophobic, 21.1% charged, 8.8% polar). For the pathogenic variants that lose function due to loss of stability (quadrant (I)), we see greater changes in physicochemical properties among those substitutions (Fig. 5 A). Interestingly, in quadrant (II), where variants lose function due to other reasons than stability, we see mainly hydrophobic target amino acid types (54.8%, with one-third coming from charged to hydrophobic substitutions).
Inspired by the enrichment of pathogenic variants in the transmembrane region (Fig. 1 E), which is enriched with hydrophobic residues (23), we analyze substitutions by physicochemical properties. Specifically, we calculated the median score for G and E for each combination of wild-type and target amino acids and arranged the amino acids by hydrophobicity (73) (Fig. 5 B). For variants where the target residue is more hydrophobic (e.g., Arg to Leu, Arg to Trp, or Asp to Tyr variants), we indeed see a different pattern when looking at the median stability values (G) compared with the median E values. These variants appear to be tolerated by protein stability, but not by evolution (Fig. 5 B, dashed upper rectangle). In contrast, variants changing the residue to be less hydrophobic are indicated as not tolerated by evolution and destabilizing (Fig. 5 B, solid lower rectangle).
Conclusions
Here, we present an analysis of missense variants and their properties within human membrane proteins. We identified unique variants in 5471 proteins, of which 99.1% are of uncertain significance and only 19,089 have been classified as pathogenic or benign. In addition, we see an almost twofold excess of pathogenic variants compared with benign variants in the transmembrane regions, which make up only 16.1% of the proteins.
We have examined the importance of changes in membrane protein stability for causing loss of function. We analyzed 15 proteins and calculated the change in thermodynamic stability and evolutionary conservation (E). Our ROC analysis shows good performance in separating benign from pathogenic variants by their sequence conservation (AUC = 0.82 for E), and we find that, for our 15 analyzed transmembrane proteins, ∼62% of the pathogenic variants appear to cause loss of function via loss of stability. This indicates that loss of stability indeed plays an important role for disease variants in membrane proteins, in line with previous findings on soluble proteins, although this needs to be confirmed with studies on larger data sets. In the 15 selected proteins, we observe that most variants have a hydrophobic wild-type (33%) or target (37%) amino acid type and that almost 65% of benign and/or nonrare variants that are likely tolerated, as assessed by both E and G, do not change their amino acid type. Among pathogenic variants that lose function due to loss of stability, substitutions to charged, polar, or hydrophobic are more prominent, while we observe substitutions from more hydrophobic to less hydrophobic residues in variants that lose function due to other reasons than stability.
When analyzing the different classes of membrane proteins, we observe for transporter proteins that pathogenic variants in the transmembrane region have an AUC of 0.97 for E, and loss of stability does not appear to be the predominant factor in loss of function for the cell junction protein we examined. In contrast, pathogenic variants in GPCRs lose function — mainly via loss of stability (AUC = 0.79 for G). We therefore suggest that pathogenic variants lose function via loss of stability more often in the transmembrane region of GPCRs than in the other protein classes we examined.
From a more detailed inspection of individual proteins, we found that most of the high-E, low-G disease variants are located at positions that have known functional roles, while many of the disease variants that are not located at functional sites have both high values of E and G, suggesting that these variants instead disrupt the stability of the folded state.
Our observations underline the importance of stability and the loss thereof in disease-causing variants of membrane proteins and thereby show how computational tools can aid in interpreting molecular mechanisms that underlie disease. Such functional understanding may help address the substantial challenge of classifying variants of uncertain significance (74). Given the limited number of variants and proteins within this study, utilizing recent advantages such as the large excess of experimental structures derived from electron microscopy or computational models from, e.g., AlphaFold (75) could enable a broader analysis. We include the collection of population and ClinVar variants for the entire human proteome to facilitate such studies on membrane proteins and beyond.
Limitations of the study
We note several limitations that should be considered when interpreting the results. Our general observations and conclusions on membrane proteins and their classes are limited by the available data and proteins, partly due to our choice to only analyze experimental structures with annotated pathogenic and benign variants.
Several membrane proteins, for example, channels and cell junction proteins, function as (homo-) oligomers. In this study, we used structures of the individual proteins for our stability calculations and thereby may miss destabilizing variants in interfaces. Those variants are more difficult to interpret using stability calculations due to the lack of contacts that are affected by stability. In addition to missing interactions, conformational changes of the structure or different conformations might alter G values, and several stabilizing variants can be explained due to missing interaction partners in these structures (e.g., R135W, R135L, and G121V in OPSD are missing either the ligand or an intracellular binding partner).
In general, we do not expect variants to lead to complete protein unfolding but rather a partial unfolding, which allows recognition by the protein quality control system. Due to limited available experimental data, we are not able to differentiate stages of unfolding, which might affect the accuracy of the G calculations. Furthermore, our membrane protein data set is mainly α-helical, which is also true for most human membrane proteins; however, the stability score function was parameterized and benchmarked on bacterial proteins, which are often β barrels and might fold differently compared with their helical counterparts. When we compared experimental and computational G values, we obtained a Spearman rank correlation coefficient of 0.46, leaving uncertainty about the predictability of the extent of loss of stability. It is worth noting that the correlation between two sets of experimental G measurements in GlpG (57,58) shows a Spearman correlation of 0.65, and when correlating all experimental data sets with at least 12 overlapping variants we obtain a mean Spearman correlation of 0.6. Preferences for specific amino acid properties in certain environments such as the membrane might be biased by their values within the respective scoring function. Our results also depend on how different protein regions are defined.
Evolutionary sequence conservation measurements cannot give direct insights into the mechanism that causes pathogenicity. Variants labeled loss of function here may instead exhibit the more rare gain of function. Furthermore, our calculations of E scores depend on the MSA, and we note that using a different MSA, e.g., by changing the E value cutoff, could shift some of the E values from tolerated to not tolerated.
The filters we apply on the variants are chosen based on literature and experience. A more detailed analysis on the effects of this filtering (and their cutoffs) with a larger data set of variants is needed. For now, we applied AUC calculations on each of the filtering steps to address a potential bias (see supporting material).
Finally, and as already discussed above, we combine benign and/or nonrare gnomAD variants into group B. This should be taken into account when interpreting the results and especially when investigating outliers of group B, as those could be variants of unknown significance.
Author contributions
K.L.-L. and A.S. conceived the original idea and supervised the project. J.K.S.T. retrieved and processed all data with contribution by H.Z. who extracted and processed the ClinVar and gnomAD data. J.K.S.T. and A.S. designed the Rosetta pipeline framework and J.K.S.T. implemented and benchmarked it with K.L.-L. and A.S. J.K.S.T. performed all calculations and processing of the data. J.K.S.T. analyzed the data and interpreted the results with K.L.-L. and A.S. J.K.S.T., K.L.-L., and A.S. wrote the manuscript with input from H.Z.
Data availability
All scripts are available at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al and as stated in the “utilized software” section.
All parsed data from UniProt, ClinVar, and gnomAD for the human proteome are available at https://sid.erda.dk/sharelink/c3rDfqR8nn. All parsed data from wwPDB for the human membrane proteome are available at https://sid.erda.dk/sharelink/AtoGToVaZ8. All data related to the human membrane proteome and our x-ray subset are available at https://sid.erda.dk/sharelink/ds37GLjR8U. All data related to the MP stability benchmark are available at https://sid.erda.dk/sharelink/foHTKP49jC.
Acknowledgments
We thank Matteo Cagiada for providing an automatic pipeline for GEMME calculations and Kristoffer Enøe Johansson for resourcing us with his alignment and merging implementations. Additional thanks go to Julia Koehler Leman for helpful discussion regarding membrane protein implementations in Rosetta. This study was funded by the Protein Interactions and Stability in Medicine and Genomics (PRISM) center funded by the Novo Nordisk Foundation (NNF18OC0033950, to A.S. and K.L.-L.) and a grant from the Lundbeck Foundation (R272-2017-4528, to A.S.). We acknowledge access to resources from the Department of Biology’s core facility for biocomputing.
Declaration of interests
The authors declare no competing interests.
Editor: Diego Ferreiro.
Footnotes
Henrike Zschach’s present address is Center for Health Data Science, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
Supporting material can be found online at https://doi.org/10.1016/j.bpj.2022.12.031.
Contributor Information
Kresten Lindorff-Larsen, Email: lindorff@bio.ku.dk.
Amelie Stein, Email: amelie.stein@bio.ku.dk.
Supporting citations
References (76,77,78,79,80,81,82,83,84,85) appear in the supporting material.
Supporting material
References
- 1.Soskine M., Tawfik D.S. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 2010;11:572–582. doi: 10.1038/nrg2808. https://www.nature.com/articles/nrg2808 [DOI] [PubMed] [Google Scholar]
- 2.Pey A.L., Stricher F., et al. Martinez A. Predicted effects of missense mutations on native-state stability account for phenotypic outcome in phenylketonuria, a paradigm of misfolding diseases. Am. J. Hum. Genet. 2007;81:1006–1024. doi: 10.1086/521879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yue P., Li Z., Moult J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 2005;353:459–473. doi: 10.1016/j.jmb.2005.08.020. [DOI] [PubMed] [Google Scholar]
- 4.Casadio R., Vassura M., et al. Luigi Martelli P. Correlating disease-related mutations to their effect on protein stability: a large-scale analysis of the human proteome. Hum. Mutat. 2011;32:1161–1170. doi: 10.1002/humu.21555. [DOI] [PubMed] [Google Scholar]
- 5.Martelli P.L., Fariselli P., et al. Casadio R. Large scale analysis of protein stability in OMIM disease related human protein variants. BMC Genom. 2016;17(Suppl 2):397. doi: 10.1186/s12864-016-2726-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Nielsen S.V., Stein A., et al. Hartmann-Petersen R. Predicting the impact of Lynch syndrome-causing missense mutations from structural calculations. PLoS Genet. 2017;13:e1006739. doi: 10.1371/journal.pgen.1006739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Abildgaard A.B., Stein A., et al. Hartmann-Petersen R. Computational and cellular studies reveal structural destabilization and degradation of MLH1 variants in Lynch syndrome. Elife. 2019;8:e49138. doi: 10.7554/eLife.49138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gersing S.K., Wang Y., Hartmann-Petersen R. Mapping the degradation pathway of a disease-linked aspartoacylase variant. PLoS Genet. 2021;17:e1009539. doi: 10.1371/journal.pgen.1009539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Scheller R., Stein A., et al. Hartmann-Petersen R. Toward mechanistic models for genotype-phenotype correlations in phenylketonuria using protein stability calculations. Hum. Mutat. 2019;40:444–457. doi: 10.1002/humu.23707. [DOI] [PubMed] [Google Scholar]
- 10.Clausen L., Stein A., et al. Hartmann-Petersen R. Folliculin variants linked to Birt-Hogg-Dubé syndrome are targeted for proteasomal degradation. PLoS Genet. 2020;16:e1009187. doi: 10.1371/journal.pgen.1009187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stein A., Fowler D.M., Hartmann-Petersen R., Lindorff-Larsen K. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 2019;44:575–588. doi: 10.1016/j.tibs.2019.01.003. http://www.sciencedirect.com/science/article/pii/S0968000419300039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Park H., Bradley P., Greisen P., Liu Y., Mulligan V.K., Kim D.E., Baker D., DiMaio F. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theor. Comput. 2016;12:6201–6212. doi: 10.1021/acs.jctc.6b00819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Guerois R., Nielsen J.E., Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 2002;320:369–387. doi: 10.1016/S0022-2836(02)00442-4. [DOI] [PubMed] [Google Scholar]
- 14.Kellogg E.H., Leaver-Fay A., Baker D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins. 2011;79:830–838. doi: 10.1002/prot.22921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ó Conchúir S., Barlow K.A., Pache R.A., Ollikainen N., Kundert K., O’Meara M.J., Smith C.A., Kortemme T. A web resource for standardized benchmark datasets, metrics, and rosetta protocols for macromolecular modeling and design. PLoS One. 2015;10 doi: 10.1371/journal.pone.0130433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Frenz B., Lewis S.M., et al. Song Y. Prediction of protein mutational free energy: benchmark and sampling improvements increase classification accuracy. Front. Bioeng. Biotechnol. 2020;8:558247. doi: 10.3389/fbioe.2020.558247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jepsen M.M., Fowler D.M., et al. Lindorff-Larsen K. In: Protein Homeostasis Diseases. Pey A.L., editor. Academic Press; 2020. Chapter 5 - classifying disease-associated variants using measures of protein activity and stability; pp. 91–107.https://www.sciencedirect.com/science/article/pii/B9780128191323000051 [Google Scholar]
- 18.Cagiada M., Johansson K.E., et al. Lindorff-Larsen K. Understanding the origins of loss of protein function by analyzing the effects of thousands of variants on activity and abundance. Mol. Biol. Evol. 2021;38:3235–3246. doi: 10.1093/molbev/msab095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Høie M.H., Cagiada M., et al. Lindorff-Larsen K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep. 2022;38:110207. doi: 10.1016/j.celrep.2021.110207. [DOI] [PubMed] [Google Scholar]
- 20.Meng X., Clews J., et al. Ford R.C. The cystic fibrosis transmembrane conductance regulator (CFTR) and its stability. Cell. Mol. Life Sci. 2017;74:23–38. doi: 10.1007/s00018-016-2386-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kampmeyer C., Nielsen S.V., et al. Hartmann-Petersen R. Blocking protein quality control to counter hereditary cancers. Genes Chromosomes Cancer. 2017;56:823–831. doi: 10.1002/gcc.22487. [DOI] [PubMed] [Google Scholar]
- 22.Uhlén M., Fagerberg L., et al. Pontén F. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. doi: 10.1126/science.1260419. [DOI] [PubMed] [Google Scholar]
- 23.von Heijne G. The membrane protein universe: what’s out there and why bother? J. Intern. Med. 2007;261:543–557. doi: 10.1111/j.1365-2796.2007.01792.x. [DOI] [PubMed] [Google Scholar]
- 24.Hauser A.S., Attwood M.M., et al. Gloriam D.E. Trends in GPCR drug discovery: new agents, targets and indications. Nat. Rev. Drug Discov. 2017;16:829–842. doi: 10.1038/nrd.2017.178. https://www.nature.com/articles/nrd.2017.178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sanders C.R., Nagy J.K. Misfolding of membrane proteins in health and disease: the lady or the tiger? Curr. Opin. Struct. Biol. 2000;10:438–442. doi: 10.1016/s0959-440x(00)00112-3. [DOI] [PubMed] [Google Scholar]
- 26.Hamel C. Retinitis pigmentosa. Orphanet J. Rare Dis. 2006;1:40. doi: 10.1186/1750-1172-1-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Koepsell H. Glucose transporters in brain in health and disease. Pflügers Archiv. 2020;472:1299–1343. doi: 10.1007/s00424-020-02441-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Vanier M.T. Niemann-Pick disease type C. Orphanet J. Rare Dis. 2010;5:16. doi: 10.1186/1750-1172-5-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cournia Z., Allen T.W., et al. Bondar A.-N. Membrane protein structure, function and dynamics: a perspective from experiments and theory. J. Membr. Biol. 2015;248:611–640. doi: 10.1007/s00232-015-9802-0. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515176/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hong H. In: Lipids in Protein Misfolding. Gursky O., editor. Springer International Publishing; Cham: 2015. Role of lipids in folding, misfolding and function of integral membrane proteins; pp. 1–31. (Advances in Experimental Medicine and Biology). [DOI] [PubMed] [Google Scholar]
- 31.Booth P.J., Clarke J. Membrane protein folding makes the transition. Proc. Natl. Acad. Sci. USA. 2010;107:3947–3948. doi: 10.1073/pnas.0914478107. https://www.pnas.org/content/107/9/3947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chang Y.-C., Bowie J.U. Measuring membrane protein stability under native conditions. Proc. Natl. Acad. Sci. USA. 2014;111:219–224. doi: 10.1073/pnas.1318576111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Boland C., Olatunji S., et al. Caffrey M. Membrane (and soluble) protein stability and binding measurements in the lipid cubic phase using label-free differential scanning fluorimetry. Anal. Chem. 2018;90:12152–12160. doi: 10.1021/acs.analchem.8b03176. [DOI] [PubMed] [Google Scholar]
- 34.Marx D.C., Fleming K.G. Membrane proteins enter the fold. Curr. Opin. Struct. Biol. 2021;69:124–130. doi: 10.1016/j.sbi.2021.03.006. https://www.sciencedirect.com/science/article/pii/S0959440X21000440 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Alford R.F., Fleming P.J., et al. Gray J.J. Protein structure prediction and design in a biologically realistic implicit membrane. Biophys. J. 2021;120:4635. doi: 10.1016/j.bpj.2021.09.019. https://www.sciencedirect.com/science/article/pii/S0006349521007530 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Laine E., Karami Y., Carbone A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 2019;36:2604–2619. doi: 10.1093/molbev/msz179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Frazer J., Notin P., et al. Marks D.S. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599:91–95. doi: 10.1038/s41586-021-04043-8. [DOI] [PubMed] [Google Scholar]
- 38.Feinauer C., Weigt M. Context-aware prediction of pathogenicity of missense mutations involved in human disease. bioRxiv. 2017 doi: 10.1101/103051v1. Preprint at. [DOI] [Google Scholar]
- 39.Nicoludis J.M., Gaudet R. Biochimica et Biophysica Acta (BBA) - Biomembranes 1860. 2018. Applications of sequence coevolution in membrane protein biochemistry; pp. 895–908.https://www.sciencedirect.com/science/article/pii/S0005273617303140 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lin Z., Akin H., et al. Rives A. Evolutionary-scale prediction of atomic level protein structure with a language model. 2022. https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2 [DOI] [PubMed]
- 41.Gerasimavicius L., Livesey B.J., Marsh J.A. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat. Commun. 2022;13:3895. doi: 10.1038/s41467-022-31686-6. https://www.nature.com/articles/s41467-022-31686-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.UniProt Consortium UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Landrum M.J., Lee J.M., et al. Maglott D.R. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46:D1062–D1067. doi: 10.1093/nar/gkx1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Karczewski K.J., Francioli L.C., et al. MacArthur D.G. The mutational constraint spectrum quantified from variation in 141, 456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. https://www.nature.com/articles/s41586-020-2308-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Cock P.J.A., Antao T., et al. de Hoon M.J.L. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Remmert M., Biegert A., et al. Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods. 2011;9:173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
- 47.Mirdita M., von den Driesch L., et al. Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017;45:D170–D176. doi: 10.1093/nar/gkw1081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ruan J., Liu Z., et al. Yu G. DBS: a fast and informative segmentation algorithm for DNA copy number analysis. BMC Bioinf. 2019;20:1. doi: 10.1186/s12859-018-2565-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Koehler Leman J., Lyskov S., Bonneau R. Computing structure-based lipid accessibility of membrane proteins with mp_lipid_acc in RosettaMP. BMC Bioinf. 2017;18:115. doi: 10.1186/s12859-017-1541-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 52.Touw W.G., Baakman C., et al. Vriend G. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 2015;43:D364–D368. doi: 10.1093/nar/gku1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Lomize M.A., Pogozheva I.D., et al. Lomize A.L. OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res. 2012;40:D370–D376. doi: 10.1093/nar/gkr703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Alford R.F., Koehler Leman J., et al. Gray J.J. An integrated framework advancing membrane protein modeling and design. PLoS Comput. Biol. 2015;11:e1004398. doi: 10.1371/journal.pcbi.1004398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Koehler Leman J., Mueller B.K., Gray J.J. Expanding the toolkit for membrane protein modeling in Rosetta. Bioinformatics. 2017;33:754–756. doi: 10.1093/bioinformatics/btw716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Koehler Leman J., Lyskov S., et al. Bonneau R. Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks. Nat. Commun. 2021;12:6947. doi: 10.1038/s41467-021-27222-7. https://www.nature.com/articles/s41467-021-27222-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Baker R.P., Urban S. Architectural and thermodynamic principles underlying intramembrane protease function. Nat. Chem. Biol. 2012;8:759–768. doi: 10.1038/nchembio.1021. https://www.nature.com/articles/nchembio.1021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Paslawski W., Lillelund O.K., et al. Otzen D.E. Cooperative folding of a polytopic α-helical membrane protein involves a compact N-terminal nucleus and nonnative loops. Proc. Natl. Acad. Sci. USA. 2015;112:7978–7983. doi: 10.1073/pnas.1424751112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Krzanowski W.J., Hand D.J. Chapman and Hall/CRC; New York: 2009. ROC Curves for Continuous Data. [Google Scholar]
- 60.Fleishman S.J., Leaver-Fay A., et al. Baker D. RosettaScripts: a scripting language interface to the rosetta macromolecular modeling suite. PLoS One. 2011;6:e20161. doi: 10.1371/journal.pone.0020161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Khatib F., Cooper S., Tyka M.D., Xu K., Makedon I., Popović Z., Baker D., Players F. Algorithm discovery by protein folding game players. Proc. Natl. Acad. Sci. USA. 2011;108:18949–18953. doi: 10.1073/pnas.1115898108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Maguire J.B., Haddox H.K., et al. Kuhlman B. Perturbing the energy landscape for improved packing during computational protein design. Proteins. 2021;89:436–449. doi: 10.1002/prot.26030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zaucha J., Heinzinger M., et al. Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Briefings Bioinf. 2021;22:bbaa132. doi: 10.1093/bib/bbaa132. [DOI] [PubMed] [Google Scholar]
- 64.Lee E., Manoil C. Mutations eliminating the protein export function of a membrane-spanning sequence. J. Biol. Chem. 1994;269:28822–28828. https://www.sciencedirect.com/science/article/pii/S0021925819619800 [PubMed] [Google Scholar]
- 65.Varadi M., Anyango S., et al. Velankar S. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022;50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.del Alamo D., Sala D., et al. Meiler J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. Elife. 2022;11:e75751. doi: 10.7554/eLife.75751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Akdel M., Pires D.E.V., et al. Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 2022;29:1056–1067. doi: 10.1038/s41594-022-00849-w. https://www.nature.com/articles/s41594-022-00849-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.wwPDB consortium Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2019;47:D520–D528. doi: 10.1093/nar/gky949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Sörmann J., Schewe M., et al. Tucker S.J. Gain-of-function mutations in KCNK3 cause a developmental disorder with sleep apnea. Nat. Genet. 2022;54:1534–1543. doi: 10.1038/s41588-022-01185-x. https://www.nature.com/articles/s41588-022-01185-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Hofmann K.P., Scheerer P., et al. Ernst O.P. A G protein-coupled receptor at work: the rhodopsin model. Trends Biochem. Sci. 2009;34:540–552. doi: 10.1016/j.tibs.2009.07.005. [DOI] [PubMed] [Google Scholar]
- 71.Custódio T.F., Paulsen P.A., et al. Pedersen B.P. Structural comparison of GLUT1 to GLUT3 reveal transport regulation mechanism in sugar porter family. Life Science Alliance. 2021;4:1–12. doi: 10.26508/lsa.202000858. https://www.life-science-alliance.org/content/4/4/e202000858 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kapoor K., Finer-Moore J.S., et al. Stroud R.M. Mechanism of inhibition of human glucose transporter GLUT1 is conserved between cytochalasin B and phenylalanine amides. Proc. Natl. Acad. Sci. USA. 2016;113:4711–4716. doi: 10.1073/pnas.1603735113. https://europepmc.org/articles/PMC4855560 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhao G., London E. An amino acid ”transmembrane tendency” scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: relationship to biological hydrophobicity. Protein Sci. 2006;15:1987–2001. doi: 10.1110/ps.062286306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Anderson C.L., Munawar S., et al. Eckhardt L.L. How functional genomics can Keep pace with VUS identification. Front. Cardiovasc. Med. 2022;9:900431. doi: 10.3389/fcvm.2022.900431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Jumper J., Evans R., et al. Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. https://www.nature.com/articles/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Gaffney K.A., Hong H. The rhomboid protease GlpG has weak interaction energies in its active site hydrogen bond network. J. Gen. Physiol. 2019;151:282–291. doi: 10.1085/jgp.201812047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Guo R., Gaffney K., et al. Hong H. Steric trapping reveals a cooperativity network in the intramembrane protease GlpG. Nat. Chem. Biol. 2016;12:353–360. doi: 10.1038/nchembio.2048. https://www.nature.com/articles/nchembio.2048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Min D., Jefferson R.E., et al. Yoon T.-Y. Mapping the energy landscape for second-stage folding of a single membrane protein. Nat. Chem. Biol. 2015;11:981–987. doi: 10.1038/nchembio.1939. https://www.nature.com/articles/nchembio.1939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Hong H., Park S., et al. Tamm L.K. Role of aromatic side chains in the folding and thermodynamic stability of integral membrane proteins. J. Am. Chem. Soc. 2007;129:8320–8327. doi: 10.1021/ja068849o. [DOI] [PubMed] [Google Scholar]
- 80.Hong H., Szabo G., Tamm L.K. Electrostatic couplings in OmpA ion-channel gating suggest a mechanism for pore opening. Nat. Chem. Biol. 2006;2:627–635. doi: 10.1038/nchembio827. https://www.nature.com/articles/nchembio827 [DOI] [PubMed] [Google Scholar]
- 81.Moon C.P., Fleming K.G. Side-chain hydrophobicity scale derived from transmembrane protein folding into lipid bilayers. Proc. Natl. Acad. Sci. USA. 2011;108:10174–10177. doi: 10.1073/pnas.1103979108. https://www.pnas.org/content/108/25/10174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Stanley A.M., Fleming K.G. The role of a hydrogen bonding network in the transmembrane β-barrel OMPLA. J. Mol. Biol. 2007;370:912–924. doi: 10.1016/j.jmb.2007.05.009. https://www.sciencedirect.com/science/article/pii/S0022283607006213 [DOI] [PubMed] [Google Scholar]
- 83.McDonald S.K., Fleming K.G. Aromatic side chain water-to-lipid transfer free energies show a depth dependence across the membrane normal. J. Am. Chem. Soc. 2016;138:7946–7950. doi: 10.1021/jacs.6b03460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Marx D.C., Fleming K.G. Influence of protein scaffold on side-chain transfer free energies. Biophys. J. 2017;113:597–604. doi: 10.1016/j.bpj.2017.06.032. https://www.cell.com/biophysj/abstract/S0006-3495(17)30682-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Huysmans G.H.M., Baldwin S.A., et al. Radford S.E. The transition state for folding of an outer membrane protein. Proc. Natl. Acad. Sci. USA. 2010;107:4099–4104. doi: 10.1073/pnas.0911904107. https://www.pnas.org/content/107/9/4099 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All scripts are available at https://github.com/KULL-Centre/papers/tree/main/2022/hMP-Xray-Tiemann-et-al and as stated in the “utilized software” section.
All parsed data from UniProt, ClinVar, and gnomAD for the human proteome are available at https://sid.erda.dk/sharelink/c3rDfqR8nn. All parsed data from wwPDB for the human membrane proteome are available at https://sid.erda.dk/sharelink/AtoGToVaZ8. All data related to the human membrane proteome and our x-ray subset are available at https://sid.erda.dk/sharelink/ds37GLjR8U. All data related to the MP stability benchmark are available at https://sid.erda.dk/sharelink/foHTKP49jC.