Abstract
Motivation
Many membrane peripheral proteins have evolved to transiently interact with the surface of (curved) lipid bilayers. Currently, methods to quantitatively predict sensing and binding free energies for protein sequences or structures are lacking, and such tools could greatly benefit the discovery of membrane-interacting motifs, as well as their de novo design.
Results
Here, we trained a transformer neural network model on molecular dynamics data for 50 000 peptides that is able to accurately predict the (relative) membrane-binding free energy for any given amino acid sequence. Using this information, our physics-informed model is able to classify a peptide’s membrane-associative activity as either non-binding, curvature sensing, or membrane binding. Moreover, this method can be applied to detect membrane-interaction regions in a wide variety of proteins, with comparable predictive performance as state-of-the-art data-driven tools like DREAMM, PPM3, and MODA, but with a wider applicability regarding protein diversity, and the added feature to distinguish curvature sensing from general membrane binding.
Availability and implementation
We made these tools available as a web server, coined Protein-Membrane Interaction predictor (PMIpred), which can be accessed at https://pmipred.fkt.physik.tu-dortmund.de.
Introduction
Peripheral membrane proteins (PMPs) are a class of proteins that transiently adhere to the surface of the biological membranes that encapsulate and compartmentalize cells. These protein–membrane interactions are crucial for the protein’s subcellular localization and related function, e.g. in membrane remodelling, transport, or lipid metabolism (Whited and Johs 2015, Monje-Galvan and Klauda 2016).
The membrane partitioning of PMPs is often regulated by distinct protein regions that have evolved to optimally guide these proteins to their functional locations. One striking example is the amphipathic lipid packing sensing (ALPS) motif that facilitates specific binding to curved membranes and not to flat ones. ALPS was first described as part of the ArfGAP1 protein that regulates COPI coat assembly in a curvature-specific manner (Bigay et al. 2003). With this original ALPS motif as a template, Drin et al. (2007) defined a set of physicochemical criteria (e.g. prevalence of certain residues, net charge, hydrophobicity, hydrophobic moment) that allowed for a bioinformatic screening of protein databases, which accelerated the discovery of many other curvature-sensing proteins.
Despite this breakthrough, there are many membrane-active protein motifs that do display similar curvature-related behaviour, but that do not fulfill Drin’s criteria. For example, the AH repeats of -synuclein specifically bind to curved anionic membranes (Davidson et al. 1998), but strongly differ from ALPS in their amino acid composition (Pranke et al. 2011). Hence, it is suggested that curvature sensing is governed by a subtle balance between the chemical properties of the hydrophobic and the hydrophilic faces (Drin and Antonny 2010) that is hard to generalize using simple physicochemical descriptors.
The example of curvature sensing shows that it is not trivial to predict which regions of a given protein structure are responsible for membrane interactions; let alone distinguishing curvature specificity at the same time. Previously developed methods include data-informed classifiers that predict which amino acids in a protein structure are membrane-interacting (e.g. MODA (Kufareva et al. 2014) and DREAMM (Chatzigoulas and Cournia 2022a,b)) or describe the general protein orientation with respect to a (curved) membrane (e.g. PPM3 (Lomize et al. 2022)). Although these methods perform well for their respective protein targets, they lack the quantitative nature that is required to detect the more subtle interactions involved in curvature sensing.
In this article, we present an alternative method that is able to predict protein–membrane interactions for peptides and protein structures from a bottom-up physics-based perspective. Our approach involves a transformer neural network (NN) model that is trained on a large set of molecular dynamics (MD) data on relative binding free energies for short peptides (24 residues) interacting with model membranes. Since our data were initially gathered during the optimization of curvature sensing (van Hilten et al. 2023), our model is additionally able to distinguish curvature- (or, equivalently, lipid packing defect-) sensing from general membrane binding. Moreover, because of its physics-informed character, it is—compared to previous data-driven methods—better able to generalize membrane-interaction features across a wide variety of protein families, including lipid kinases, ALPS-containing curvature sensors, -synuclein, and N-BAR proteins. To enable researchers to access our tool without requiring any programming skills, we incorporated it in our Protein–Membrane Interaction predictor (“PMIpred”) web server, that is available at https://pmipred.fkt.physik.tu-dortmund.de.
Approach
Theoretical background
In previous work, we developed a free-energy calculation method to efficiently quantify curvature-sensing free energy () for a peptide within (coarse-grained) MD simulations (van Hilten et al. 2022). is the difference in affinity for a curved/stretched membrane versus a flat equilibrium membrane. Since binding to the lipid packing defects at curved membrane surfaces negates an energetic penalty, has a negative value for most peptides. The larger the magnitude (), the stronger the sensing propensity.
Because is linearly related to the sequence length L, we can extrapolate it to our maximal reference length of 24 amino acids, which enables fair comparison between peptides of different lengths:
| (1) |
In which and . Next, to account for the fact that the majority of biomembranes is negatively charged while the membranes in our simulations were neutral, we can add another correction:
| (2) |
With kJ mol, meaning that adding a positive charge z reduces by 0.93 kJ mol and, conversely, adding a negative charge increases it by 0.93 kJ mol.
By combining this approach with an evolutionary algorithm (EA) to efficiently navigate the vast search space ( possible combinations), we found that the optimal curvature sensor is hydrophobic and bulky, indicating that curvature sensing and general membrane binding —for binding to a vesicle with radius R—are strongly correlated:
| (3) |
By fitting our data () to thermodynamic integration simulations that yield , we found that and . The relative strain of the target membrane in our MD simulations equals 0.165. The estimation of could be alternatively fine-tuned in future generations of the model via incorporation of additional empirically derived algorithmic rules, such as Hristova and White (2005), in order to correct for systematic force field errors.
Finally, we theoretically derived that the membrane-binding probability relates to this according to a sigmoid-like switch function:
| (4) |
In which the ratio between the number of realizations in solvent () and on the membrane () is estimated from typical experimental setups to be . This curve, plotted in Fig. 1A, captures three distinct regimes: non-binders (peptides stay in solution, ), curvature sensors (peptides only bind to curved membranes, ), and binders (peptides bind to any membrane, ). Details on the derivations of Equation (1–4) and their constants can be found in our previous publication (van Hilten et al. 2023).
Figure 1.
(A) The membrane-binding probability as a function of the membrane-binding free energy (at nm) shows a sharp transition. Please note the distinction between and the relative curvature-sensing free energy , which are linearly correlated (Eq. 3 and Fig. 4C in our previous work (van Hilten et al. 2023)). The orange area marks the ‘sensor regime’ (van Hilten et al. [2023]). Insets show cartoon explanations of the three classes. (B) Correlation plot for 96 sequences (all 24 residues long, not part of the training data) spanning the full -range. MD-calculated values are averages and standard deviations for 3 independent replicas (500 ns simulation per run, as described in van Hilten et al. (2023)). (C) Schematic workflow of predicting membrane-interaction regions in protein structures within PMIpred. The trained transformer model predicts (in kJ mol) for overlapping peptide segments from the protein sequence. From the 3D protein structure, per-residue SASA values are calculated. Finally, those two features are combined to yield the final quantitative prediction, as visualized by PMIpred. Non-binding and buried residues are shown in purple. Sensing residues are shown in orange. Binding residues are shown in red. The protein structure shown here is the espin1 ENTH domain (PDB: 5onf).
Transformer model
The randomly initiated evolutionary optimization of curvature-sensing peptides we performed previously (van Hilten et al. 2023) yielded a large dataset of 53 940 unique peptide sequences (all 24 residues long) and their respective values, as calculated by coarse-grained MD simulations (see Supplementary Section S1). An advantage of such a physics-based dataset is that it spans the entire applicability domain from ‘inactive’ ( kJ mol) to the theoretical optimum ( kJ mol). Previously, we showed that a convolutional neutral network (CNN) trained on these data are capable of accurately predicting a peptide’s value and classifying its membrane-binding behaviour according to the three aforementioned categories (non-binder, sensor, binder, see Fig. 1A) (van Hilten et al. 2023). Moreover, it did so across different peptide families and with point-mutation sensitivity.
In the current work, we follow-up on this by employing a transformer model to the same problem. Transformer models use a so-called ‘attention mechanism’ to learn patterns from text input, making them especially useful in natural language processing (Vaswani et al. 2017). Transformer NNs are able to process whole sentences (or in our case, peptide sequences) at once, which gives them two advantages compared to CNNs and long short-term memory models: (1) they can process input faster, and (2) they are better able to learn long range relations between elements, which is important when—for example—the first and last word of a long input sentence are conceptually and/or grammatically connected.
All details on the transformer architecture, optimization, and training are described in the supplementary Section S1.
Our final transformer model is able to predict the calculated from MD simulations with an expected error (root-mean square deviation) of 1.79 kJ mol, across the full free-energy range (Fig. 1B), which is similar to the typical standard deviations we get for the MD simulations in the training data.
Identifying membrane-interaction regions in protein structures
In order to utilize our method to detect membrane-interaction regions in PMPs, we apply a sliding window that divides the full protein sequence into equally sized segments and predicts values for each of them. By default, we consider segments of 15 residues, which is a typical length for a secondary structure element in a protein. A ‘per-residue’ score is obtained by taking the average of the overlapping segment scores at each position (Fig. 1C). These per-residue values can be interpreted as the contribution of that single amino acid to the overall membrane-binding ability of the protein region it is in. When classifying residues as non-binders, sensors, or binders, we use the -thresholds that follow from the definitions in Equation (1–4): −6.4 kJ mol for non-binders; kJ mol for sensors, and -10.0 kJ mol for binders.
Next, it is important to consider whether a residue is at the surface of the protein structure (e.g. from the PDB or AlphaFold (AF)), since this is a prerequisite for it to engage in membrane-binding interactions. Our method accounts for this by calculating the solvent-accessible surface area (SASA) for every individual residue from the protein structure (using the Shrake-Rupley algorithm (Shrake and Rupley 1973) as implemented in BioPython (Cock et al. 2009), see Fig. 1C), and then taking a local average of the 9-residue vicinity - with the resulting SASA value being assigned to residue n.
For intuitive visualization, the predicted sensing/binding activity can be mapped and colour-coded onto the 3D protein structure (right panel in Fig. 1C). To this end, we use the B-factor field in the output PDB-file that is commonly used for applying colour schemes in molecular visualization software (e.g. PyMOL (Schrödinger, LLC 2015)). Predicted non-binding (or inaccessible), sensing, and binding regions get assigned B-factors of 0.0, 0.5, and 1.0, respectively.
Results and discussion
Predicting membrane interaction for peptides
To apply the trained transformer model on individual peptide sequences, PMIpred allows users to input a string of 7–24 natural amino acids (single-letter abbreviations). Also, they need to specify if their target membrane is neutral (e.g. pure POPC) or negatively charged (as is common in nature), see Fig. 2A. In case of the latter, Equation (2) is applied. Upon submission, the web server will produce:
Figure 2.
(A) Users can input a peptide sequence (part of ArfGAP1’s ALPS2 motif (Mesmin et al. 2007), in this example) and chose their desired target membrane. (B) Helical wheel diagram, generated by modlAMP (Müller et al. 2017). (C) Predicted free-energy values and physicochemical characteristics. (D) The position of the input sequence on the non-binding binding continuum. Predicted sensors fall in the orange transition zone. (E) Membrane-binding probability as a function of vesicle radius R. Again, sensors fall in the orange area.
A helical wheel diagram of the input sequence, generated by the modlAMP python module (Müller et al. 2017) (Fig. 2B). This diagram visualizes the spatial orientation of the amino acid side chains once folded into an -helix. This can be especially useful to, for example, identify a clustering of hydrophobic residues at one side of the helix, which increases the peptide’s amphipathicity and generally favours interactions with lipid bilayer surfaces. Please note that PMIpred—just like the MD simulations that the model is trained on—always assumes an -helical folding for input sequences. Additional MD data on helical and unstructured peptides suggest that membrane-binding free energies are dominated by the general amino acid composition, and not secondary structure (see S2). Nevertheless, if an input peptide is known to have a non-helical fold (and, potentially, related function), the PMIpred results should be interpreted with caution.
Predicted free energies and several commonly used physicochemical descriptors (Fig. 2C). From top to bottom: (1) The NN-predicted curvature-sensing free energy (). The greater the magnitude of , the higher the preference for the peptide to bind to (strongly curved) membranes. (2) The length-adjusted (Equation (1)), that enables fair comparison for peptides with different lengths. (3) The charge-adjusted (Equation (2)) that additionally corrects for charge-sensing effects on negatively charged membranes. (4) , i.e. the predicted membrane-binding free energy to vesicles with a 50 nm radius (Equation (3)). Depending on the choice of target membrane (neutral or negatively charged), or is used for the calculations, respectively. Finally, (5)—and much like tools such as HeliQuest (Gautier et al. 2008) – PMIpred computes some physicochemical descriptors: sequence length, charge, mean hydrophobicity (using Fauchère and Pliška’s hydrophobicity scale (Fauchère and Pliška 1983)), and Eisenberg’s hydrophobic moment (Eisenberg et al. 1982).
The membrane-binding probability as a function of the curved-membrane-binding free energy at a 50 nm vesicle radius, as also plotted in Fig. 1A. PMIpred depicts the position of the query sequence on this curve (Fig. 2D).
The membrane-binding probability as a function of vesicle radius R (and thus membrane curvature ), see Fig. 2E. Peptides that fall in the purple area of this plot are predicted to be non-binders, that prefer to stay in solution regardless of the curvature of available membranes (). Conversely, the red zone would represent peptides that bind to any membrane, regardless of curvature (). The orange transition zone is the sensing regime, where the binding probability strongly depends on the membrane curvature. In this plot, a peptide (and its associated value) is represented by a curve, as demonstrated for our example peptide.
Screening proteins for membrane-interaction motifs
As outlined previously, PMIpred can also be applied to predict which regions in PMPs take part in curvature-sensing or membrane-binding behaviour. To demonstrate this feature, we highlight three example protein structures from different protein families.
ArfGAP1
First, we examined ArfGAP1 (Bigay et al. 2003), which comprises a structured Zn-finger domain to control GTP hydrolysis in Arf1 on the Golgi membrane (Cukierman et al. 1995). This catalytic domain is followed by a disordered region that contains two ALPS motifs (Mesmin et al. 2007), which fold into -helices upon binding to curved membranes. Hereby, they couple the COPI-induced curvature to the catalytic activity and control the transport vesicle budding process. In absence of a PDB entry for the full ArfGAP1 protein, we downloaded the predicted structure from the AlphaFold (AF) database (Jumper et al. 2021, Varadi et al. 2022), which accurately models the distinct structured and unstructured domains (although for the latter, the relative orientation between the flexible loops is likely wrong). We applied our protein screening tool to ArfGAP1 and found that it correctly identifies the two curvature-sensing motifs ALPS1 and ALPS2 (Fig. 3A). Parts of both ALPS motifs were labeled as ‘binders’, although they only slightly surpassed the sensing binding threshold at for most positions (see data repository at https://github.com/nvanhilten/PMIpred). Interestingly, our model also identified a third region (residue 134–151, directly adjacent to the catalytic Zn-finger domain) to potentially contribute to (curvature-specific) membrane interactions. To the best of our knowledge, this segment is not described to be involved in membrane interaction in the current literature, which could mean that either this is a false positive identification, or it is an actual undiscovered membrane-interacting motif that would be interesting to examine further.
Figure 3.
(A–C) Screening results for ArfGAP1 (A, AF: Q8N6T3), the ENTH domain from epsin1 (B, PDB: 5onf), and the N-BAR domain from Bin1 (C, AF: Q9BTH3). Insets show sequences and corresponding predicted classification of putative active regions. Purple: non-binding or SASA below threshold (-). Orange: curvature sensing (S). Red: membrane binding (B). The known membrane-interaction regions are marked in grey. (D) Comparing PMIpred to DREAMM, PPM3, and MODA.
Epsin1 ENTH
The second protein we studied is the N-terminal ENTH domain from epsin1, which is responsible for its binding to anionic phosphatidylinositol 4,5-bisphosphate (PI(4,5)P2) lipids in the inner leaflet of the plasma membrane (Itoh et al. 2001). Once bound, epsin1 facilitates endocytosis by inducing positive membrane curvature and driving the assembly of clathrin coat components (Ford et al. 2002). Previous studies have shown that ENTH favours curved membranes (Capraro et al. 2010), but also interacts with flat membranes to drive curvature by shallow insertion of its N-terminal helix, named helix-0 (Stahelin et al. 2003, Kweon et al. 2006, Lai et al. 2012). Consistent with these findings, our screening method successfully highlighted helix-0 as a membrane binder (Fig. 3B). Besides the crucial insertion of hydrophobic residues (e.g. L6 (Ford et al. 2002)), helix-0 has a net charge of +4 to complement the negatively charged PI(4,5)P2-rich membrane domains that it binds to. The correct identification of this helix in the ENTH domain indicates that, even though the simulations underlying our model were all performed on neutral POPC membranes, the charge correction (Equation (2)) we apply is sufficiently accurate to distill these effects.
Bin1 N-BAR
Finally, we screened the N-BAR domain from Bin1, which is a crescent-shaped protein that polymerizes to drive the remodelling of membranes into tubes. Hereto, we took the predicted structure of the full Bin1 protein from the AF database, and isolated the N-BAR domain (residue 1–239). Our screen highlighted part of the N-terminal helix (just like for ENTH, named helix-0), classifying it as a curvature sensor (Fig. 3C). When comparing this to the extensive literature on Bin1’s helix-0, we see that—contrary to our method’s classification—it behaves as a binder in liposome experiments, which show that its membrane binding is not curvature-dependent (Fernandes et al. 2008) and that both the full Bin1 N-BAR domain and the amphipathic helix-0 alone can induce tubulation (Löw et al. 2008). Given that the BAR domain by itself (so, without helix-0), also induces liposome tubulation (Peter et al. 2004), the current understanding on this is that helix-0 is responsible for the initial docking and positive curvature induction, which then triggers binding of the full crescent-shaped BAR domain and the subsequent polymerization of the full protein scaffold (Adam et al. 2015, Simunovic et al. 2015). Additionally, our screening tool predicted three other sensing/binding domains that were not sufficiently solvent exposed (SASA 0.8 nm2) to be highlighted in the results (see data repository at https://github.com/nvanhilten/PMIpred). Two of these domains are indeed located at the membrane-binding face of the concave N-BAR domain.
When taking together our screening results for ArfGAP1, epsin1, and Bin1, we conclude that our method was indeed able to correctly pick out the important regions from these proteins (Fig. 3A–C). However, the sensing/binding classification is not perfect. This likely stems from a combination of inaccuracies in the model and inconsistencies between experimental setups. After all, curvature sensing and membrane binding will be strongly affected by liposome formulations, peptide concentrations, and potential synergistic effects between proteins that may differ between experiments and are not accounted for in our simulations.
Comparison with other tools
Current tools that predict membrane-interaction sites in proteins (DREAMM (Chatzigoulas and Cournia 2022a,b), PPM3 (Lomize et al. 2022), and MODA (Kufareva et al. 2014)) produce output on the single-residue level, whereas PMIpred quantifies membrane-binding regions; averaging the free energy over the local surrounding of a given amino acid. This complicates a one-to-one comparison with these qualitative prediction methods on the single-residue level (e.g. as done in ref. (Chatzigoulas and Cournia 2022b)). Therefore, we proceeded to benchmark PMIpred against the other web servers in the following manner: we composed a test set of 27 PMPs, including all 19 proteins from Chatzigoulas and Cournia’s test set (Chatzigoulas and Cournia 2022b) (mainly comprising lipid kinases and lipid transfer proteins) and eight known lipid packing defect sensing proteins (two PDB structures and six AF predictions); see Supplementary Table S2 for details. We treated the PDB and AF structures in the latter test set separately, because DREAMM, PPM3, and MODA were not designed to be applied to (partly unstructured) AF predictions.
For all these protein structures, we segmented the sequences into consecutive fragments of 15 amino acids (matching PMIpred’s default window size), which were marked positive if they contained residue that was predicted to be membrane-interacting (either sensing or binding for PMIpred), and negative if none of the 15 residues in the fragment was labelled as a ‘hit’. We argue that, although it provides fundamentally different insights than, e.g. DREAMM, this ‘protein-region’-way of looking at membrane-binding interactions is more akin to the way a structural biologist may look at a protein structure, with loops or helices (rather than independent individual residues) dynamically inserting into the membrane leaflet.
We performed this segment-based binary classification analysis and calculated the Matthews correlation coefficient (MCC, see Supplementary Section S4) for the four methods on the different test sets (Fig. 3D). We observed that all tools score similarly on the full test set (MCC ). When only considering the proteins in Chatzigoulas and Cournia’s test set, we see that PMIpred is outperformed, mainly by DREAMM and PPM3, due to the fact that these tools are—by design—focused on predicting the single-residue protein-membrane interactions that are typical for the types of PMPs in this test set. In contrast, PMIpred performed much better when predicting the (often less strictly defined) membrane-interacting regions of lipid packing defect sensors, both for the PDB and the AF structures. Details on all results described in this section are provided in Supplementary Table S3 and the data are accessible at https://github.com/nvanhilten/PMIpred.
In general, we note that the majority of erroneous predictions are false positives rather than false negatives, with a false discovery rate (FDR) of 60% and a false omission rate (FOR) of only 4.4% for PMIpred (and similar values for the other web servers, see Supplementary Section S4).
Taken together, we argue that PMIpred’s predictions are generally on par with state-of-the-art tools like DREAMM, PPM3, and MODA, with the important improvement that PMIpred additionally provides quantitative information on membrane-interaction propensity per residue in actual thermodynamic terms (free energies) in the sense that one can directly rank and quantitatively compare its numerical predictions. In contrast, DREAMM and PPM3 act as binary classifiers and MODA only yields membrane-interaction ‘likelihoods’, that are hard to interpret biophysically. We should note, however, that direct comparison of the predicted values with experiments is complicated for two reasons: (1) its value is strongly dependent on the degree of stretching (curvature) in the target membrane (as discussed previously (van Hilten et al. 2022)), and (2) the contribution of the entropy is typically underestimated in free energies that are calculated from simulations.
Because PMIpred is trained on bottom-up physics-based data, it is likely more self-consistent at generalizing those concepts to undiscovered or de novo designed protein structures that are highly distinct from typical known PMPs, compared to data-informed models that are trained on protein databases. To this end, we suggest that the physics-informed transformer NN we developed here may be able to additionally boost the performance of prediction methods based on ensemble machine learning approaches such as DREAMM by serving as an additional, independent (strong) classifier, especially since these approaches are notably orthogonal. Finally, we emphasize that users now have multiple different tools to their disposal and encourage them to use these various approaches to compare results and draw conclusions based on the ensemble of results.
Try it yourself
Our protein screening tool is accessible within the PMIpred web server. Here, users can submit a PDB-file or PDB ID and also tune the size of the sliding window and the SASA threshold value (Fig. 4A). Similar to PMIpred’s peptide module, users can specify if their target membrane is neutral (e.g. POPC) or negatively charged, as is common in biology. Again, if the latter is chosen, charge correction (Equation (2)) is applied to the value to account for additional coulomb interactions and concomitant ‘charge sensing’.
Figure 4.
(A) The module takes four inputs: (1) The size of the sliding window (from 7 to 24 residues, default = 15). (2) The SASA threshold (default = 0.8 nm2), above which a residue is considered sufficiently solvent-accessible to contribute to membrane interactions. (3) The target membrane. (4) An uploaded PDB-file or PDB ID. (B) Output includes: (5) A downloadable zip-file containing all output files that are also visualized on this page. (6) An interactive visualization of the protein structure using 3Dmol (Rego and Koes 2015). Predicted non-binding, sensing, and binding regions are coloured purple, orange, and red, respectively. (7) An output table containing the full protein sequence and the annotated classification: B, binder; S, sensor; -, non-binder or not accessible (SASA below threshold). (C) A table containing the free energies and classifications of the individual segments that were screened. (D) A more detailed look into the data underlying the table in (B), including the SASA, values, and classification per residue.
After submission, the results are displayed in the browser (Fig. 4B) and available to be downloaded as a zip-file. The processed protein is displayed interactively on the results page, using the aforementioned colouring: purple, orange, and red, for the non-binding (or buried) regions, predicted sensing domains, and membrane-binding domains, respectively. The data underlying this colouring are listed per segment (Fig. 4C) and per residue (Fig. 4D).
We temporarily store the latest 100 protein structure submissions on our local server under randomized file names to ensure anonymity.
Cautionary remarks and recommendations
Before concluding this paper, we wish to point out several cases in which PMIpred’s results should be interpreted with some extra caution.
We note that shorter peptides have a higher tendency to be classified as binders (Supplementary Fig. S5A). This effect can not be explained by poorer accuracy in the transformer’s prediction for shorter sequences, because the normalized error is constant over the full length range (7-24 residues, Supplementary Fig. S5B). Alternatively, we suggest that this bias is likely introduced by our length extrapolation step (Equation (1)) that intrinsically has a more pronounced effect for shorter peptides. Therefore, we advise users to interpret predictions on short peptides ( residues) with some caution.
Our charge-correction (Equation (2)) is based on a set of benchmark peptides from the experimental literature with charges ranging from to (van Hilten et al. 2023). Predictions for peptides outside this charge range should be considered with some extra caution.
The interplay between membrane binding and peptide folding is also linked to the lipid composition of the membrane. This aspect is simplified in the MD simulations on which our model is trained, because we only used pure POPC and POPC/POPE membranes. If a protein’s membrane binding behaviour is linked to interactions with a specific lipid species, PMIpred may underestimate the membrane affinity.
The N- and C-terminal residues of proteins are more likely to be identified as (false) positives by the PMIpred protein screening module, for two reasons: (1) the termini of proteins tend to have a high solvent-accessibility. And, (2) the per-residue is averaged over fewer positions, and thus possibly less accurate. Please note that the same applies to gaps in protein chains.
As mentioned before, PMIpred always assumes an -helical fold. Although our MD simulations suggest that folding (helical vs random coil) is a less determinant factor than amino acid composition (Supplementary Fig. S4), we recommend users to take extra caution when working with peptides that feature a known functional non-helical secondary structure. Of course, this also applies to non-helical domains in full-length proteins.
When using predicted 3D structures (e.g. from AF), we advise to only submit structures with high confidence—pLDDT 70—, unless a section is known to be unstructured.
Protein-protein interaction sites may be falsely identified as membrane-binding regions by PMIpred, within the protein context. This misinterpretation is due to the fact that both phenomena are driven mostly by hydrophobic residues being exposed at the surface of the protein. For example, we observed this for the RAS-binding domain of PI(4,5)P2 3-kinase, which included two exposed leucines L233 and L234 (see Supplementary Table S3). We should emphasize here, however, that our method is based on peptide-data, and that the membrane-interaction predictions on the peptide fragments that make up a protein-protein interaction domain may still be valid.
Applying PMIpred to lipid-anchored proteins (e.g. KRAS in section S6) requires some extra caution, because our web server only considers the protein’s amino acids and neglects lipidic membrane anchors (e.g. Cys-linked farnesyl or palmitoyl moieties).
Although PMIpred is not designed for it, it can also be applied to transmembrane proteins, since the physicochemical properties driving membrane-surface binding and membrane internalization are similar (hydrophobicity, mostly). We do note, however, that it may miss transmembrane regions for multi-pass proteins (e.g. GPCRs) due to the crowded protein environment and consequent low ‘solvent’-accessibility. Generally, we found that PMIpred behaves similar to DREAMM in this regard (Supplementary Fig. S7). When working on transmembrane proteins, we recommend using PMIpred in conjunction with other web servers that are specifically developed for this (e.g. DeepTMHMM (Hallgren et al. 2022), TMAlphaFold (Dobson et al. 2023), or others).
Conclusions
We have developed a transformer NN that, with the sole input requirement of an amino acid sequence, can predict the relative binding free energy () for peptides interacting with curved membranes. Additionally, it provides a basic estimation of through linear interpolation. We implemented this model into a user-friendly web server, named PMIpred (https://pmipred.fkt.physik.tu-dortmund.de), where researchers can readily predict sensing free energies for any peptide, without requiring any simulation or coding experience.
As a second module in PMIpred, users can screen protein structures for putative curvature-sensing or membrane-binding motifs. Hereto, a sliding window is applied to the protein sequence to calculate a free-energy contribution for every residue. By coupling this to the solvent-accessible surface area (SASA) of that residue, the module is able to map and visualize the predicted membrane-interaction activity onto the protein structure. We applied this method to a comprehensive and diverse test set of known PMPs, and show that PMIpred is on par with the state-of-the-art tools DREAMM, PPM3, and MODA. In contrast to these other web servers, PMIpred provides quantitative predictions in terms of relative membrane-interaction free energies that can be directly interpreted biophysically, and additionally enables users to distinguish curvature-sensing from membrane-binding motifs. PMIpred also quantifies the contribution/weight of each individual amino acids to the predicted membrane binding, which guides the design of point mutations to experimentally probe protein-membrane interactions.
Since this method is trained on physics-based data (where experimental findings contribute only indirectly through the parametrization of the MD force field), we argue that it should be able to characterize membrane-interaction behaviour from a broadly applicable fundamental perspective. In contrast, most current approaches are based on physicochemical descriptors, sequence ‘grammar rules’ that define known curvature-sensing motifs (e.g. ALPS (Drin et al. 2007)) but are unable to generalize their predictions to unrelated membrane-interacting motifs (e.g. -synuclein and helix-0 of N-BAR and ENTH). When it comes to predicting the membrane interaction sites in PMPs, current methods are mainly data-driven, and thus rely on experimental findings on natural proteins. Although these tools are already remarkably powerful, we argue that the unique physics-informed predictions by PMIpred can additionally boost the performance of existing ensemble classifiers such as DREAMM, especially because it provides insights from a completely different angle and the data sources are orthogonal. Ultimately, by making PMIpred easily accessible for both computational and experimental biologists, we aim to accelerate the discovery and characterization of curvature-sensing and membrane-binding peptides and protein motifs.
Supplementary Material
Acknowledgments
The authors would like to thank Agata Witkowska for fruitful discussions and feedback. Art Hoti is thanked for thoroughly testing the web server.
Contributor Information
Niek van Hilten, Leiden Institute of Chemistry, Leiden University, Leiden 2333 CC, Netherlands.
Nino Verwei, Leiden Institute of Chemistry, Leiden University, Leiden 2333 CC, Netherlands.
Jeroen Methorst, Leiden Institute of Chemistry, Leiden University, Leiden 2333 CC, Netherlands.
Carsten Nase, Department of Physics, Technical University Dortmund, Dortmund 44227, Germany.
Andrius Bernatavicius, Leiden Institute of Advanced Computer Science, Leiden University, Leiden 2333 CA, Netherlands; Leiden Academic Centre for Drug Research, Leiden University, Leiden 2333 CC, Netherlands.
Herre Jelger Risselada, Leiden Institute of Chemistry, Leiden University, Leiden 2333 CC, Netherlands; Department of Physics, Technical University Dortmund, Dortmund 44227, Germany.
Funding
The Dutch Research Organisation NWO (Snellius@Surfsara) and the HLRN Göttingen/Berlin are acknowledged for the provided computational resources. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC 2033—390677874—RESOLV. We also thank the NWO Vidi scheme (project number 723.016.005) for funding. We gratefully acknowledge the Gauss Centre for Supercomputing e. V. (www.gauss-centre.eu) for funding this project by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre (JSC).
Author contributions
NvH and HJR designed the research and wrote the manuscript. NvH, NV, JM, and AB wrote the code. NV and AB designed, trained, and optimized the transformer model. NV and NvH designed the web server. CN was responsible for hosting the web server, and provided technical assistance.
Conflict of interest
None declared.
Data availability
The data underlying this article are available at https://github.com/nvanhilten/PMIpred.
References
- Adam J, Basnet N, Mizuno N.. Structural insights into the cooperative remodeling of membranes by amphiphysin/bin1. Sci Rep 2015;5:15452–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bigay J, Gounon P, Robineau S. et al. Lipid packing sensed by ArfGAP1 couples COPI coat disassembly to membrane bilayer curvature. Nature 2003;426:563–6. [DOI] [PubMed] [Google Scholar]
- Capraro B, Yoon Y, Cho W. et al. Curvature sensing by the epsin n-terminal homology domain measured on cylindrical lipid membrane tethers. J Am Chem Soc 2010;132:1200–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatzigoulas A, Cournia Z.. DREAMM: a web-based server for drugging protein-membrane interfaces as a novel workflow for targeted drug design. Bioinformatics 2022a;38:5449–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chatzigoulas A, Cournia Z.. Predicting protein-membrane interfaces of peripheral membrane proteins using ensemble machine learning. Brief. Bioinform 2022b;23bbab518, [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cock P, Antao T, Chang J. et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 2009;25:1422–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cukierman E, Huber I, Rotman M. et al. The ARF1 GTPase-activating protein: zinc finger motif and golgi complex localization. Science 1995;270:1999–2002. [DOI] [PubMed] [Google Scholar]
- Davidson W, Jonas A, Clayton D. et al. Stabilization of alpha-synuclein secondary structure upon binding to synthetic membranes. J Biol Chem 1998;273:9443–9. [DOI] [PubMed] [Google Scholar]
- Dobson L, Szekeres L, Gerdán C. et al. TmAlphaFold database: membrane localization and evaluation of AlphaFold2 predicted alpha-helical transmembrane protein structures. Nucleic Acids Res 2023;51:D517–D522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drin G, Antonny B.. Amphipathic helices and membrane curvature. FEBS Lett 2010;584:1840–7. [DOI] [PubMed] [Google Scholar]
- Drin G, Casella J, Gautier R. et al. A general amphipathic alpha-helical motif for sensing membrane curvature. Nat Struct Mol Biol 2007;14:138–46. [DOI] [PubMed] [Google Scholar]
- Eisenberg D, Weiss R, Terwilliger T.. The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature 1982;299:371–4. [DOI] [PubMed] [Google Scholar]
- Fauchère J, Pliška V.. Hydrophobic parameters-pi of amino-acid side-chains from the partitioning of n-acetyl-amino-acid amides. Eur. J. Med. Chem 1983;18:369–75. [Google Scholar]
- Fernandes F, Loura L, Chichón F. et al. Role of helix 0 of the N-BAR domain in membrane curvature generation. Biophys J 2008;94:3065–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ford M, Mills I, Peter B. et al. Curvature of clathrin-coated pits driven by epsin. Nature 2002;419:361–6. [DOI] [PubMed] [Google Scholar]
- Gautier R, Douguet B, Antonny D. et al. HELIQUEST: a web server to screen sequences with specific α-helical properties. Bioinformatics 2008;24:2101–2. [DOI] [PubMed] [Google Scholar]
- Hallgren J, Tsirigos K, Pedersen M. et al. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. bioRxiv 2022. 10.1101/2022.04.08.487609. [DOI]
- Hristova K, White SH.. An experiment-based algorithm for predicting the partitioning of unfolded peptides into phosphatidylcholine bilayer interfaces. Biochemistry 2005;44:12614–9. [DOI] [PubMed] [Google Scholar]
- Itoh T, Koshiba S, Kigawa T. et al. Role of the ENTH domain in phosphatidylinositol-4,5-bisphosphate binding and endocytosis. Science 2001;291:1047–51. [DOI] [PubMed] [Google Scholar]
- Jumper J, Evans R, Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kufareva I, Lenoir M, Dancea F. et al. Discovery of novel membrane binding structures and functions. Biochem Cell Biol 2014;92:555–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kweon D-H, Shin Y-K, Shin J. et al. Membrane topology of helix 0 of the epsin n-terminal homology domain. Mol. Cells 2006;21:428–35. [PubMed] [Google Scholar]
- Lai C-L, Jao C, Lyman E. et al. Membrane binding and self-association of the epsin n-terminal homology domain. J Mol Biol 2012;423:800–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lomize A, Todd S, Pogozheva I.. Spatial arrangement of proteins in planar and curved membranes by PPM 3.0. Protein Sci 2022;31:209–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Löw C, Weininger U, Lee H. et al. Structure and dynamics of helix-0 of the n-bar domain in lipid micelles and bilayers. Biophys J 2008;95:4315–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mesmin B, Drin G, Levi S. et al. Two lipid-packing sensor motifs contribute to the sensitivity of ArfGAP1 to membrane curvature. Biochemistry 2007;46:1779–90. [DOI] [PubMed] [Google Scholar]
- Monje-Galvan V, Klauda J.. Peripheral membrane proteins: tying the knot between experiment and computation. Biochim Biophys Acta 2016;1858:1584–93. [DOI] [PubMed] [Google Scholar]
- Müller A, Gabernet G, Hiss J. et al. modlAMP: python for antimicrobial peptides. Bioinformatics 2017;33:2753–5. [DOI] [PubMed] [Google Scholar]
- Peter B, Kent H, Mills I. et al. BAR domains as sensors of membrane curvature: the amphiphysin BAR structure. Science 2004;303:495–9. [DOI] [PubMed] [Google Scholar]
- Pranke I, Morello V, Bigay J. et al. α-synuclein and ALPS motifs are membrane curvature sensors whose contrasting chemistry mediates selective vesicle binding. J. Cell Biol 2011;194:89–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rego N, Koes D.. 3Dmol.js: molecular visualization with WebGL. Bioinformatics 2015;31:1322–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrödinger, LLC. The PyMOL molecular graphics system, version 1.8. 2015.
- Shrake A, Rupley J.. Environment and exposure to solvent of protein atoms. lysozyme and insulin. J Mol Biol 1973;79:351–71. [DOI] [PubMed] [Google Scholar]
- Simunovic M, Voth G, Callan-Jones A. et al. When physics takes over: BAR proteins and membrane curvature. Trends Cell Biol 2015;25:780–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stahelin R, Long F, Peter B. et al. Contrasting membrane interaction mechanisms of AP180 n-terminal homology (ANTH) and epsin n-terminal homology (ENTH) domains. J Biol Chem 2003;278:28993–9. [DOI] [PubMed] [Google Scholar]
- van Hilten N, Stroh K, Risselada H.. Efficient quantification of lipid packing defect sensing by amphipathic peptides: comparing martini 2 and 3 with CHARMM36. J Chem Theory Comput 2022;18:4503–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Hilten N, Methorst J, Verwei N. et al. Physics-based generative model of curvature sensing peptides; distinguishing sensors from binders. Sci Adv 2023;9:eade8839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varadi S, Anyango M, Deshpande M. et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 2022;50:D439–D444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Adv Neural Inf Process Syst 2017;30:6000–10. [Google Scholar]
- Whited A, Johs A.. The interactions of peripheral membrane proteins with biological membranes. Chem Phys Lipids 2015;192:51–9. [DOI] [PubMed] [Google Scholar]
- Wimley W. Describing the mechanism of antimicrobial peptide action with the interfacial activity model. ACS Chem Biol 2010;5:905–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Y, Mitra B.. A highly active, soluble mutant of the membrane-associated (s)-mandelate dehydrogenase from pseudomonas putida. Biochemistry 1999;38:12367–76. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article are available at https://github.com/nvanhilten/PMIpred.




