Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2026 Feb 2;66(4):2299–2310. doi: 10.1021/acs.jcim.5c02566

Decoding Protein–Membrane Binding Interfaces from Surface-Fingerprint-Based Geometric Deep Learning and Molecular Dynamics Simulations

ByungUk Park , Reid C Van Lehn †,‡,*
PMCID: PMC12933713  PMID: 41622943

Abstract

Predicting protein–membrane interactions is a formidable challenge due to the subtle physicochemical features that distinguish membrane-binding regions of a protein surface as well as the scarcity of experimentally resolved membrane-bound protein conformations. Here, we present MaSIF-PMP, a geometric deep learning model that leverages molecular surface fingerprints to predict interfacial binding sites (IBSs) of peripheral membrane proteins (PMPs). MaSIF-PMP integrates geometric and chemical surface features to produce spatially resolved IBS predictions. Compared to existing models, MaSIF-PMP achieves superior performance for IBS classification, while feature ablation studies and transfer learning analyses reveal distinct determinants governing protein–membrane versus protein–protein interactions. We further show that molecular dynamics (MD) simulations can validate model predictions, refine IBS labels, and capture composition-dependent membrane binding patterns. These results establish MaSIF-PMP as an effective framework for IBS prediction and highlight the potential of incorporating conformational dynamics from MD to improve both the model accuracy and biological interpretability.


graphic file with name ci5c02566_0009.jpg


graphic file with name ci5c02566_0008.jpg

1. Introduction

Peripheral membrane proteins (PMPs) transiently bind to the surface of lipid membranes to mediate crucial cellular processes such as signaling, trafficking, apoptosis, lipid metabolism, and immunity. While PMPs exhibit a variety of molecular weights and geometries, their reversible association with the membrane is dictated by common features such as the membrane lipid environment, ions like Ca2+, and the polarity and geometry of the protein itself. PMPs bind to lipid membranes via noncovalent interactions with residues present at a region of the PMP surface termed the interfacial binding site (IBS). IBSs typically have a mixture of basic and hydrophobic amino acids that mediate nonspecific electrostatic interactions and penetration into the membrane, respectively. Discovering and characterizing these IBSs is important for understanding biological mechanisms and uncovering novel sites for therapeutic intervention by modulating PMP–membrane interactions. However, identifying IBSs is challenging since their physicochemical properties, such as the hydropathy of IBS residues, are similar to nonbinding regions of the PMP surface. Available PMP structures are also often determined from soluble states since stabilizing membrane-bound states is difficult with contemporary experimental techniques, which impairs identification of IBSs. ,,− Alternatively, mechanistic studies of PMPs have been performed using molecular dynamics (MD) simulations, but the time required to sample PMP–membrane interactions and identify the corresponding IBS is tremendously large using all-atom MD simulations. , New methods are needed to rapidly characterize IBSs in order to guide the discovery of PMP-targeting drugs, the design of biosensors with enhanced binding specificity and sensitivity, and the mechanistic analysis of PMP–membrane interactions.

Recently, advances in machine learning have greatly improved the accuracy and accessibility of protein structure prediction and design. Significant progress has been made in modeling protein interactions with diverse species, including other proteins, nucleic acids, small molecules, and metal ions. Machine learning models thus have the potential to rapidly screen large sets of PMPs to identify IBSs based on protein sequence or structural features. However, most existing models remain limited to predicting interactions between a protein and a single target ligand or other soluble species, as opposed to interactions with a membrane surface. Models that have been developed to identify IBSs have primarily sought to classify individual residues as binding or nonbinding. Such approaches lack explicit information about protein surface regions and their associated chemical properties that mediate protein–membrane interactions. One surface-centered machine learning method, referred to as molecular surface interaction fingerprinting (MaSIF), has emerged as a versatile approach to predict varied protein interactions by representing the surface of a biomolecule (e.g., protein, peptide, ligand) as a set of patches, each comprising numerical descriptors of geometric and chemical features. This approach has demonstrated wide applicability across diverse protein interactions (e.g., with small molecules and protein) and has enabled the de novo design of peptides that bind to specific protein surfaces. , A MaSIF-based approach has also successfully designed peptides that target novel binding sites, or “neosurfaces”, that arise from protein–ligand complexes, thus expanding the target space for drug discovery. , These findings highlight the unique generalizability of surface fingerprint descriptors for predicting molecular interactions.

In this study, we introduce MaSIF-PMP as a geometric deep learning model that extends the MaSIF approach to predict IBSs. Because protein–membrane interactions are governed by a subtle interplay of geometric and chemical surface features, including curvature, electrostatic potentials, hydrophobicity, and hydrogen-bonding patterns, that also influences protein–protein interactions, we hypothesized that MaSIF descriptors could be applied for IBS prediction. By training on a data set of nearly 1200 PMPs, we show that MaSIF-PMP identifies IBSs with comparable accuracy to prior predictions of protein–protein interactions and outperforms state-of-the-art IBS prediction methods on a held-out test set. Unlike existing residue-based IBS prediction methods, the model generates surface-level predictions, enabling a spatially resolved landscape of potential binding regions. We further show that all-atom MD simulations can complement MaSIF-PMP predictions by validating model predictions and generating robust IBS labels. These results showcase MaSIF-PMP as a potentially powerful method for identifying IBSs, advancing understanding of protein–membrane interactions, and enabling the design of drug-like molecules to target PMPs.

2. Results

2.1. Predicting Interfacial Binding Sites Using Surface Fingerprints

We developed MaSIF-PMP as a geometric deep learning approach that operates on PMP molecular surface features to identify IBSs. Figure presents an overview of MaSIF-PMP, including surface feature preprocessing, model architecture, and interface labeling strategy. Each protein surface is discretized into a series of points, and five geometric and chemical surface features are computed for each point. Points are then assigned to a series of overlapping patches of fixed geodesic radius with one patch centered on each point. Points within each patch are further assigned geodesic polar coordinates with respect to the patch center (Figure a). The features and coordinates of the points within each patch are mapped onto learned soft polar grids, producing numerical arrays that are processed through a series of convolutional neural network (CNN) layers to yield a surface fingerprint for each patch (Figure b). The resulting surface fingerprint is passed to multilayer perceptron (MLP) layers to output a score, ranging from zero to one, with higher scores indicating greater confidence that the patch is part of an IBS. Computing scores for all patches permits analysis of the likelihood of finding an IBS for all regions across the protein surface. Additional details on the surface featurization are included in the Methods.

1.

1

Overview of the MaSIF-PMP approach. (a) Protein molecular surfaces are generated from atomic coordinates obtained from the Protein Data Bank and discretized into overlapping patches of fixed geodesic radius. Five geometric and chemical surface features, along with geodesic polar coordinates, are computed for each surface point. (b) Model architecture consisting of three convolutional layers that process each patch and output a score for the patch center. Scores range from zero to one, with higher scores indicating greater confidence that the patch is part of an interfacial binding site (IBS). (c) Ground-truth labels are assigned to surface points based on residue-level IBS annotations provided in a literature data set of 1189 PMPs. IBS-labeled residues and points are colored red, whereas non-IBS regions are colored blue. Several elements in (a) and (b) follow similar styles as the original MaSIF publication.

To train MaSIF-PMP, we identified an existing data set of 1,189 experimentally determined PMP structures with IBS annotations inferred via structural homology. Ground-truth IBS labels for model training and evaluation were derived from residue-specific annotations provided in the data set and mapped to corresponding surface points (Figure c). Specifically, each surface point was assigned the name and index of its nearest residue, allowing IBS labels to be mapped from the corresponding data set annotations. Henceforth, we use the term “residue-level” when labeling individual amino-acid residues as being part of the IBS and the term “surface-level” when labeling individual surface points (and corresponding patches) as being part of the IBS. We split the surface-level labeled structures into a training set with 1059 structures and test set with 130 structures that were held out during model training. Proteins were split by superfamily to preserve structural diversity, which was particularly important because the IBS labeling strategy for this data set assumes that binding patterns are conserved within a superfamily.

We trained MaSIF-PMP for 50 epochs using static crystal structures obtained from the RCSB Protein Data Bank. The model at each epoch was saved based on the mean receiver operating characteristic area under the curve (ROC AUC) across a 105-protein validation set, which was set aside from the training set while maintaining similar superfamily distributions. ROC AUC is a threshold-independent metric ranging from 0 (perfect misclassification) to 0.5 (random) to 1 (perfect classification) and provides a more robust assessment of binary classifiers than metrics such as accuracy or precision (Supporting Figure 6). For each protein, the per-protein ROC AUC was computed by comparing surface-level predictions with ground-truth labels, and the mean plateaued at epoch 48 (Supporting Figure 4). On the 130-protein test set, MaSIF-PMP achieved a median per-protein ROC AUC of 0.78 for surface-level predictions (Figure a), which is comparable to the performance of MaSIF when predicting protein–protein interaction sites (referred to as MaSIF-site). The most accurate MaSIF-PMP test set prediction was for the perforin C2 domain, which achieved a ROC AUC of 1.0 with predicted IBS scores closely matching the ground-truth labels (Figure b).

2.

2

Prediction accuracy of MaSIF-PMP for a 130-protein test set. (a) Distribution of per-protein ROC AUC scores. Each score is computed for an entire protein by comparing surface-level predicted scores to ground-truth labels. (b) Snapshots of ground-truth labels and surface-level predicted scores by MaSIF-PMP for the top-performing PMP, perforin C2 domain (PDB ID: 4Y1T, chain A). IBS scores range from 0 (for nonbinding regions) to 1 (for the IBS) and are visualized using a blue-to-red color scale with the color assigned to the point at the center of each labeled patch.

2.2. Benchmarking MaSIF-PMP against Alternative Methods

To further assess the prediction performance of MaSIF-PMP, we conducted a benchmark comparison against state-of-the-art tools for IBS prediction. Most existing methods, including DREAMM, PPM3, and MODA, produce residue-level predictions, while PMIpred, a physics-informed model, generates predictions at the level of 15-residue segments. In contrast, MaSIF-PMP outputs surface-level predictions. To compare between models, we mapped MaSIF-PMP predictions from surface- to residue-level and likewise mapped predictions from other models from residue- to surface-level (Supporting Figure 6). This benchmark was performed using 21 PMPs from the PMIpred benchmark set with corresponding subunit structures and their ground-truth IBS residue labels. For consistency, we updated the labels of three PMPs (PDB IDs: 1JSS, 2RSG, 1LN1) from the benchmark setwhich were also included in our test set but annotated with broader IBS regionsto use the same labels as in our test set. The same MaSIF-PMP model introduced in the previous section was used for the benchmark without retraining. PMIpred, DREAMM, and PPM3, were selected for the benchmark comparison (Supporting Information). Performance was evaluated using ROC AUC and the Matthews correlation coefficient (MCC; see Equation S5), calculated over surface points and surface-exposed residues only. The MCC was included as an evaluation metric due to its suitability for imbalanced classification tasks; in our case, nonbinding regions greatly outnumber IBS regions. MCC values range from −1 to +1, with +1 indicating perfect classification, 0 corresponding to random guessing, and −1 indicating the perfectly incorrect classification.

MaSIF-PMP outperformed all other predictors in both per-protein residue-level ROC AUC and MCC. Kernel density estimates of the ROC AUC (Figure a) and MCC (Figure b) show a clear shift to higher values for MaSIF-PMP compared to other predictors, reflecting an improved overall performance. Consistently, MaSIF-PMP achieved higher median per-protein, residue-level ROC AUC and MCC values (Figure c) as well as per-protein surface-level evaluations (Supporting Figures 7 and 8). Together, these results demonstrate that MaSIF-PMP surpasses existing IBS predictors across multiple metrics, underscoring its accuracy and robustness in handling imbalanced labels.

3.

3

Benchmark comparison of MaSIF-PMP against other IBS predictors assessed by the ROC AUC and MCC for 21 single-chain PMPs. Distributions of per-protein, residue-level (a) ROC AUC and (b) MCC predicted by MaSIF-PMP, PMIpred, DREAMM, and PPM3. Solid lines represent Gaussian kernel density estimates fit to discrete score distributions. (c) Comparison of MaSIF-PMP with other IBS predictors on the benchmark proteins. Results are reported as the median ROC AUC and median MCC per protein, evaluated on a per-residue basis to ensure comparability across the predictors.

Surface-level comparisons (Figure ) further demonstrate that MaSIF-PMP predicts broader interaction interfaces than PMIpred, DREAMM, and PPM3. In contrast to the binary outputs of other predictors, MaSIF-PMP generates continuous prediction scores that can be visualized as surface color gradients. Given that protein–membrane interactions arise from extended protein–membrane surface contacts, our surface-level approach provides a more realistic representation of binding interfaces than residue-level predictions that are limited to a small subset of anchoring residues. For instance, in the case of sphingomyelinase C (Figure c), the residue-level predictors achieved higher ROC AUC and MCC values because the annotated interface constitutes only a small portion of the protein surface. However, the broader IBS predicted by MaSIF-PMP likely reflects regions that approach the membrane during interaction and, thus, contribute to protein–membrane interactions, as further discussed in the context of MD simulations below. MaSIF-PMP can also produce binary outputs by applying a defined threshold, enabling deployment in the same manner as other predictors to designate specific surface patches as IBS regions (Supporting Figure 10). Together, these results show that MaSIF-PMP not only outperforms existing IBS predictors but also provides a more holistic view of the protein–membrane interface and the underlying interaction process.

4.

4

Visualization of the ground-truth and predicted IBSs for three PMPs selected from the benchmark test set. Each column shows PMP molecular surfaces with colors indicating binary labels (0 for nonbinding surface points or 1 for the IBS) for the ground-truth, PMIpred, DREAMM, and PPM3 columns and continuous interface prediction scores (from 0 to 1) for MaSIF-PMP following the color scheme in Figure . ROC AUC and MCC values are computed based on surface-level predictions for all three models. (a) Intestinal fatty acid binding protein (PDB ID: 3AKM). (b) (S)-mandelate dehydrogenase (PDB ID: 6BFG). (c) Sphingomyelinase C (PDB ID: 2DDR).

2.3. Analysis of Interplay between Surface Features in Protein–Membrane Interactions

To assess the relative contributions of surface features to IBS predictions, we performed an ablation study by conducting 5-fold cross-validation on the complete 1,189 PMP data set using models trained with different subsets of surface features. We hypothesized that the network trained on the most critical features would yield the highest median ROC AUC scores. As shown in Figure a, among models trained on individual feature types, the network using only geometric features (shape index and distance-dependent curvature) achieved the highest ROC AUC of 0.72, followed by models trained using only hydropathy, electrostatic potential, and hydrogen bonding features. The prediction performance of the network trained with two geometric features was comparable to that of the model trained with all three chemical features, underscoring the dominant role of geometric information in defining IBS patterns. This finding contrasts with protein–protein interaction predictions obtained using MaSIF-site, for which single chemical features outperformed geometric features. Furthermore, we observed a cumulative improvement in predictive accuracy as additional features were added as input for training, with the highest performance achieved when all five surface features were included (Supporting Figure 11). These results indicate that geometric features are more critical than individual chemical features for IBS prediction in PMPs, consistent with prior studies highlighting hydrophobic protrusions as key structural determinants of PMP IBSs. , We further analyzed the distribution of surface feature between IBS and non-IBS patches (Supporting Figure 12), and consistent with the ablation results the distributions of geometric and hydrophobicity features slightly shift for IBS vs non-IBS patches. Additional definitions of ‘hydrophobic-protruding patches’ based on positive geometric and hydrophobicity features showed that such patches are enriched at IBSs (Supporting Table 5), indicating that MaSIF-PMP can recognize physically meaningful features of IBSs. Nevertheless, the fact that integrating all five features maximizes the prediction performance underscores the subtle and complex interplay between geometric and chemical determinants in protein–membrane interactions.

5.

5

Comparisons of different approaches for MaSIF-PMP training. (a) 5-Fold cross-validation with models trained on different subsets of surface features: only the location of free electrons/proton donors (hbond), only electrostatic potential (elec), only hydropathy index (hphob), two geometric features (geom), all three chemical features (chem), and all five chemical and geometric features (all feat). (b) Transfer learning of the MaSIF-site model, which is trained on protein–protein interactions, to IBS predictions. Each column indicates the median ROC AUC for the 130-PMP test set with the MaSIF-site, MaSIF-PMP with transfer learning strategies 1 (TL opt.1) and 2 (TL opt.2), and MaSIF-PMP without transfer learning.

Because surface fingerprint descriptors have demonstrated material-agnostic applications in predicting diverse protein–ligand interactions, we next tested whether we could derive models from the MaSIF-site model, which is trained for protein–protein interactions, to improve IBS predictions via transfer learning. That is, we sought to determine whether the latent space of surface fingerprints for protein–protein interactions is sufficiently similar to the latent space of fingerprints for IBS predictions so that the extensive protein–protein interaction data used to train MaSIF-site could be leveraged to improve MaSIF-PMP. Two transfer learning (TL) strategies were employed: (i) freezing convolutional layers and retraining only a deeper fully connected network (TL opt.1) using the PMP training data or (ii) freezing convolutional layers and adding new convolutional layers for retraining (TL opt.2). Figure b compares model accuracy on the 130 PMPs test set for MaSIF-site (i.e., a model trained on protein–protein interactions without any retraining), TL opt.1, TL opt.2, and MaSIF-PMP (i.e., a model trained on labeled IBS without any transfer learning). MaSIF-site yielded a median ROC AUC of 0.50, indicating an essentially random guess when applied for IBS prediction. Both TL strategies produced only modest improvements and did not approach the prediction accuracy of MaSIF-PMP. These results, together with the ablation study presented in Figure a, emphasize the fundamental differences in how surface features govern protein versus protein–membrane interactions.

2.4. Case Studies Using MD Simulations with the HMMM Model

MaSIF-PMP demonstrated a strong performance in predicting IBSs using surface features from static crystal structures. We next asked whether incorporating conformational dynamics from MD simulations could further improve performance because previous studies have shown that PMPs often undergo conformational changes upon membrane binding. Crystal structures typically represent soluble-state conformations, which may differ substantially from the membrane-bound state and thus fail to capture IBS-relevant structural features. MD simulations provide an effective means to explore conformational dynamics and potentially identify conformations more relevant to membrane interactions, ,− but their computational cost increases with the number of simulated particles, and the required bilayer dimensions and complexity can vary widely between PMPs. We first performed MD simulations of PMPs in aqueous solution in an attempt to obtain a wider range of conformations as a form of data augmentation to improve IBS predictions, but conformations sampled from aqueous-phase simulations provide limited improvement to IBS predictions (Supporting Figure 14). Although MD simulations of PMPs in solution were not effective for improving MaSIF-PMP predictions, we next sought to evaluate whether MD simulations could (i) validate predicted IBSs for PMPs with experimental annotations, (ii) generate IBS labels for PMPs lacking experimental annotations, and (iii) enhance prediction accuracy by sampling PMP conformations relevant to the membrane-bound state. We thus evaluated case studies using MD simulations with the highly mobile membrane mimetic (HMMM) model. The HMMM model accelerates lateral lipid diffusion by truncating full-length lipids and replacing the bilayer hydrophobic core with an organic solvent region. This approach enables sampling of protein–membrane interactions at reduced computational cost while preserving atomic-level interaction details.

We first performed MD simulations of α-tocopherol transfer protein (α-TTP), a well-characterized PMP not included in our training data set (Figure ). α-TTP is a cytosolic liver protein that undergoes conformational changes upon membrane binding, notably involving distortion of its N-terminal domain toward the membrane plane, , but previous MD studies did not explicitly identify membrane-contacting residues at its IBS. We prepared systems with the protein positioned ∼1 nm above the HMMM membrane and performed unbiased simulations to sample protein–membrane interactions. IBS regions were defined from the membrane-bound conformations at the end of the trajectories by using a 0.5 nm distance threshold between heavy atoms of protein side chains and lipid atoms at the membrane surface. We further performed replica simulations with multiple orientations of α-TTP relative to the HMMM membrane to assess the reliability of the MD-based IBS labels. Although a small number of organic solvent molecules escaped from the HMMM bilayer (Figure a), this effect reflects the natural partitioning of the solvent between lipid and solution and does not affect IBS predictions. The residues in contact with the membrane were consistent across the trajectory in each replica simulation, leading to robust HMMM-derived IBS labels (Supporting Figure 15). Using the α-TTP crystal structure as input and HMMM-derived labels from the final membrane-bound simulation configuration as the ground truth, MaSIF-PMP achieved a ROC AUC of 0.81, which remained comparable (0.75) when the HMMM-derived membrane-bound conformation was used instead (Figure b). α-TTP exhibited stable membrane interactions in specific orientations, with high overlap among IBS labels derived from different replicas, and MaSIF-PMP’s prediction accuracy on the crystal structure was comparable whether IBS labels were taken from individual replicas or from a consensus IBS identified based on time-averaging over multiple simulation configurations (Supporting Figure 15). These results highlight the robustness and reliability of the HMMM-based IBS labels when compared to MaSIF-PMP predictions.

6.

6

HMMM MD simulations of PMP-membrane binding and comparison to MaSIF-PMP predictions. (a) Representative snapshots of α-TTP after 100 ns of HMMM MD simulation. The IBS contacting the membrane is highlighted in red, with an enlarged snapshot of the protein in contact with the membrane within the red dashed box. The HMMM membrane is shown in a van der Waals representation with carbon atoms in cyan, hydrogen atoms in white, oxygen atoms in red, nitrogen atoms in blue, phosphorus atoms in orange, and organic solvent 1,1-dichloroethane (DCLE) in yellow. α-tocopherol, a bound ligand in complex with α-TTP, is in purple. α-TTP is drawn in a ribbon representation in gray, with the surface shown in the enlarged image. Water molecules and ions are not shown to aid visualization. (b) Comparison of ground-truth IBS labels determined by HMMM MD and MaSIF-PMP predictions for α-TTP using either the crystal structure or the membrane-bound conformation obtained from HMMM MD.

We observed similar patterns in a second case study with the oxysterol-binding protein homologue (Osh4), a PMP that undergoes structural rearrangement in which six initially separated domains reorganize into a composite IBS with the anionic trans-Golgi network (TGN) membrane. ,, We performed HMMM simulations using lipids that mimic a TGN membrane and sampled a membrane-bound conformation within 100 ns; prior MD simulations required ∼300 ns to observe membrane binding using full-length lipid simulations, demonstrating the computational efficiency of the HMMM approach. HMMM-predicted IBS labels overlapped with the labels from the previous studies, , although they encompassed fewer residues within the IBS regions. Using IBS labels from the literature studies, , MaSIF-PMP yielded ROC AUC values of 0.63 for the crystal structure and 0.64 for the membrane-bound conformation (Supporting Figure 16). In contrast, with HMMM-predicted IBS labels, MaSIF-PMP attained ROC AUC values of 0.74 for the crystal structure and 0.68 for the membrane-bound conformation (Supporting Figure 16), underscoring the robustness and accuracy of HMMM-derived IBS labels compared to both prior annotations and MaSIF-PMP predictions.

Next, we performed HMMM MD simulations of PMPs with IBSs that were poorly predicted in the MaSIF-PMP test set to investigate whether poor prediction performance stemmed from limitations in the data set’s IBS annotationsnamely, due to homology-based labels that lacked direct experimental confirmation. We selected two such proteins: phospholipase A2 (PDB ID: 1OZY) and glycosyl hydrolase (PDB ID: 4LPL), which yielded ROC AUC scores of 0.54 and 0.42, respectively, when their crystal structures were used as model input. HMMM simulations revealed membrane composition-dependent binding modes for phospholipase A2, with distinct IBS regions emerging in anionic versus zwitterionic membranes (Figure a, Supporting Figure 17). The anionic membrane yielded more consistent consensus IBSs across replica simulations with different initial PMP orientations, suggesting a more favorable and stable binding mode (Supporting Figure 17). IBS labels combined from these replicas (referred to as the “union IBS”) captured the dynamic nature of the protein–membrane interface (Supporting Figure 17). Notably, MaSIF-PMP predictions have a higher ROC AUC when compared to the anionic HMMM-derived union IBS than with either the original data set labels or the zwitterionic HMMM-derived union IBS (Figure b), reflecting the model’s ability to identify preferred, membrane-composition-specific binding interfaces. A similar trend was observed for glycosyl hydrolase, although membrane composition had less influence on the binding mode (Supporting Figure 18, Figure c). These findings suggest that the omission of membrane composition relevant to PMP binding may contribute to low prediction performance, potentially due to mislabeling in the original data set when using homology-based labeling, and highlight the potential of HMMM simulations to refine poorly predicted labels at a manageable computational cost.

7.

7

HMMM MD is a tool to generate interface labels for proteins lacking experimental annotations. (a) Snapshots of phospholipase A2 (PDB ID: 1OZY) in contact with different types of membranes sampled from HMMM MD simulations. IBSs for each membrane are highlighted in red. Color schemes follow the same definition used in Figure a. (b, c) Snapshots of ground-truth IBS labels (based on data set annotations and HMMM MD simulations) and predicted IBS scores for (b) phospholipase A2 (PDB ID: 1OZY) and (c) glycosyl hydrolase (PDB ID: 4LPL). Tables show corresponding ROC AUC values computed with different ground-truth label types. “anionic memb.” refers to an anionic membrane, while “zwit. memb.” refers to zwitterionic membrane.

Together, these case studies demonstrate that MD simulations with the HMMM model are suitable for capturing conformational dynamics in membrane binding and for refining IBS annotations. While membrane-bound conformations from HMMM yielded reasonable but slightly lower prediction accuracy than crystal structures, these results suggest that MaSIF-PMP predictions are consistent with HMMM labels when using either crystal structures or MD-derived conformations as input and that improved selection of representative membrane-bound states could enhance model agreement.

3. Discussion

In this work, we developed MaSIF-PMP, a geometric deep learning model that leverages surface fingerprints to predict IBSs on PMPs. By encoding both geometric and chemical surface features into numerical descriptors, the model learns to discriminate between IBS and non-IBS patches following training on a large PMP data set. Our results demonstrate that surface fingerprints serve as robust, material-agnostic descriptors, extending their applicability beyond protein–protein interactions to protein–membrane systems while revealing distinct latent feature spaces underlying each interaction type. In addition, MD simulations with the HMMM model proved valuable for validating predictions, capturing distinct membrane-bound conformations of proteins depending on membrane composition and refining IBS labels at manageable computational cost.

Despite these advances, several limitations of our approach should be acknowledged. The current convolutional neural network architecture, which incorporates rotational data augmentation and processes single proteins with at least 1,000 patches per batch, is memory intensive. As computational resources allow, training with larger surface patches, extended geodesic radii, and soft grids with more trainable Gaussian kernels may enhance the prediction performance. Data set-related limitations also remain: the model was trained on single-chain protein structures and thus does not account for cases where PMPs interact with membranes as multimers (e.g., BAR proteins that bind membranes as dimers to sense curvature). Furthermore, many IBS annotations of the PMP data set were indirectly inferred based on homology rather than directly validated; generating more robust labels via MD simulations and incorporating information about specific membrane compositions could provide richer training data and enhance generalizability.

Several directions may further advance this framework in future work. Incorporating conformational flexibility into surface fingerprints, particularly geometric features, or into IBS annotations as soft labels through dynamic ensembles derived from HMMM simulations could improve the predictions for PMPs undergoing structural rearrangements. Integrating such dynamic conformations, along with membrane surface features determined by lipid composition, may ultimately improve the predictive accuracy and biological relevance of surface-based models and enable the development of a framework for identifying proteins that bind to membranes with distinct compositions that is analogous to methods to identify protein–protein interaction partners. Replacing certain descriptors with more efficient alternatives, such as topological data analysis features, could accelerate preprocessing and enable training with more extensive surface information. Finally, predicted IBSs of PMPs implicated in diseases through dysregulated membrane interactions may serve as therapeutic targets by enabling the explicit modeling of PMP–ligand interactions.

4. Methods

Data Set

PMPs were obtained from a previously published data set comprising 1,199 experimentally determined PMP structures, each annotated with constituent amino acids, IBS labels, and structural features. Ten PMPs from the data set were excluded from the training and testing sets; some were omitted due to extensive unmodeled residues in the central region of their sequences, which resulted in misleading surface representations, while others lacked IBS annotations entirely (Supporting Table 1). The training and testing sets consisted of 1059 and 130 proteins, respectively, following the same split ratio used in the original study of MaSIF-site. Proteins were split based on their superfamilies to maintain similar superfamily distributions between sets (Supporting Table 2) an important consideration since the IBS labeling procedure for the data set assumes that structurally related proteins within the same superfamily share similar IBSs. IBS annotations were primarily inferred through structural superposition, aligning PMP structures with homologous domains and transferring IBSs from annotated members supported by experimental or simulation-based evidence. This approach provided a scalable means of annotation but represents a compromise necessitated by the high computational cost of validating IBS labels via molecular dynamics simulations for all proteins in the data set. Further details on the structural split are described in the Supporting Information.

Model: MaSIF-PMP

MaSIF-PMP is similar to the previously developed protein interaction site prediction model, MaSIF-site. Compared to MaSIF-site, our model architecture is implemented in PyTorch v2.1.2 instead of Tensorflow v1, PMP data sets are used for training and testing, and the method for defining interface labels for PMPs differs from the original model.

Computation of Discretized Molecular Surfaces

The MSMS program was used to compute all molecular surfaces in this study (density = 3.0, water radius = 1.5 Å). As MSMS generates molecular surfaces with highly irregular meshes, the resulting meshes were further regularized using PyMESH (v.0.3.1) at a resolution of 1.0 Å. Consequently, each protein’s molecular surface is represented as a discretized triangulated mesh (Supporting Figure 2).

Decomposition of Protein Surfaces into Overlapping Radial Patches

For each point on the discretized protein surface mesh, we extracted a radial patch with a geodesic radius of 9 Å to capture local surface information. Each patch was limited to a maximum of 100 mesh points. This choice of a 9 Å radius with a 100-point cap was made to reduce memory requirements, thereby allowing the use of multiple convolutional layers, which are critical for accurate interface prediction. The geodesic distancesdefined as distances along the protein surfacerequired to define these patches were computed using the Dijkstra algorithm.

Computation of Geodesic Polar Coordinates

After the surface patches were extracted from a protein, our geometric deep-learning pipeline computed radial and angular coordinates for the mesh points within each patch. Representing local surface features in this geodesic polar coordinate system allows the model to map the patch onto a 2D plane. The radial coordinate is determined by using the Dijkstra algorithm, which computes the geodesic distance from the patch center to each point. To determine the angular coordinate, pairwise geodesic distances between points within the patch are first computed, and multidimensional scaling (MDS), as implemented in scikit-learn, is then used to project the points onto a 2D plane. A random direction in this plane is selected as the 0° reference axis, and the angle of each point relative to this axis is computed following previously described procedures.

Geometric and chemical surface features

Each point within a patch of the computed molecular surface was assigned an array of two geometric features (shape index and distance-dependent curvature) and three chemical features (hydropathy index, Poisson–Boltzmann electrostatic potential, and hydrogen bond potential , ). These features are identical to those used in MaSIF-site. Details on each feature and how they are computed are described in the Supporting Information.

Definition of Interfacial Binding Site Points in a Protein Surface

Using the residue-level IBS annotations of PMPs in the data set, we defined the ground-truth labels for the protein surface mesh points. Since each mesh point carries name and index attributes of its nearest residue, IBS labels could be assigned directly based on the corresponding data set annotation for the nearest residue.

Geometric Deep Learning on a Learned Soft Polar Grid

Geometric deep learning extends deep learning techniques, such as convolutional neural networks (CNNs), to non-Euclidean data structures like protein surfaces. In this work, we employed a soft polar grida system of Gaussian kernels defined in a local geodesic polar coordinate system that act as soft, overlapping pixelsas described previously for MaSIF-site. This soft grid serves as an analogue to the sliding window used in conventional CNNs for image analysis, where a window moves across the image, extracts a patch of pixels, multiplies them by corresponding learnable filter weights, and sums the results. The parameters of the Gaussian kernels are themselves learnable, enabling the network to adaptively capture local surface patterns. , Once the mapping is complete, a traditional CNN layer is applied together with angular maximum pooling, ensuring that the resulting fingerprint vectors are invariant to the randomly assigned polar coordinates of geodesic patches. Further methodological details are provided in the Supporting Information, Supporting Figure 3 and refs and .

Neural Network, Cost Function, and Training Optimization

The neural network architecture consists of three convolutional layers with learnable soft grids (Supporting Figure 3). The optimal number of convolutional layers was determined through a systematic evaluation by incrementally increasing the number of layers (Supporting Figure 4). The network takes as input the full protein surface decomposed into overlapping patches with a radius of 9 Å. The patch radius was chosen to balance surface coverage with memory efficiency, thereby enabling the use of deeper convolutional layers. Each patch is mapped onto learned grids with three radial bins and four angular bins. The network outputs an IBS score between 0 and 1 for the center point of each patch. During training, each protein was treated as a single batch, and the network was optimized using an Adam optimizer with a sigmoid cross-entropy loss function. Because non-IBS points greatly outnumber IBS points in most proteins, negative samples were randomly subsampled to balance the data set and ensure an equal number of positive and negative samples. Training was performed on an NVIDIA L40 GPU for 50 epochs (approximately 1.55 h), with all proteins in the training set processed once per epoch. Model performance was evaluated using the per protein ROC AUC metric; 10% of the training data was set aside as a validation set to monitor model performance. The model was saved whenever the validation ROC AUC improved, with the final saved model occurring at epoch 48. Beyond epoch 50, the validation ROC AUC reached a plateau, indicating that 48 epochs were sufficient for the network to converge (Supporting Figure 4a).

5-Fold Cross-Validation with Feature Masking

We performed 5-fold cross-validation using the PMP data set to evaluate the contribution of individual surface features to IBS characterization. For each fold, models were systematically trained with specific features masked such that the masked features were excluded from training. All experiments were conducted over 50 training epochs.

Transfer Learning

We performed two different strategies for transfer learning of the MaSIF-site, which is trained on protein–protein interactions, to MaSIF-PMP. First, we froze the updated parameters of convolutional layers of the MaSIF-site and replaced the final fully connected networks with deeper ones (FC128, FC64, FC4, FC2). Second, we froze the same convolutional layers and then added new three convolutional layers so that only the parameters of additional layers and final fully connected network block are updated during training on PMP data set. Schematics and further details of each structure are provided in the Supporting Information and Supporting Figure 13. Training on protein–protein interactions was performed under the same conditions as MaSIF-site whereas training on the PMP data set followed the conditions described for MaSIF-PMP in the Methods.

Highly Mobile Membrane-Mimetic (HMMM) Simulations

HMMM Model

The highly mobile membrane-mimetic (HMMM) model accelerates the lateral diffusion of lipid molecules by one to two orders of magnitude while preserving atomic-level interaction details by replacing the hydrophobic core with an organic solvent region and truncating full-length lipids. This approach is advantageous for all-atom protein–membrane interaction simulations, as conventional full-length lipid bilayers exhibit slow membrane reorganization dynamics. HMMM systems were prepared using the CHARMM-GUI HMMM Builder. Default parameters were used for the terminal acyl carbon number (6), proteins were protonated assuming pH 7.0, standard charged termini were applied (N-terminus: −NH3 +; C-terminus: −COO), 1,1-dichloroethane (DCLE) was used as an organic solvent, and systems were neutralized with NaCl counterions in explicit water. All simulations employed the CHARMM36m force field and TIP3P water model. Systems were energy-minimized and equilibrated under NPT conditions by using the CHARMM-GUI-generated simulation protocols. Production simulations of 100 ns were performed at constant NPT, maintaining a temperature of 310.15 K with a velocity-rescale thermostat and a pressure of 1 bar using a semiisotropic stochastic cell-rescaling barostat. Further details are provided in the Supporting Information.

System Preparation for Case Studies

For α-tocopherol transfer protein (α-TTP), we adopted conditions from a previous MD study, using the same crystal structure (PDB ID: 3W67), lipid composition, and simulations under NPT conditions at 300 K. For additional case studies involving PMPs with low prediction performance, HMMM simulations were performed with two representative membrane types (anionic and zwitterionic) under NPT conditions at 310.15 K and 1 bar. In all systems, proteins were initially placed ∼ 1 nm above the HMMM membrane surface by using different orientations for different replicas. For oxysterol-binding homologue (Osh4), we used the experimentally determined structure (PDB ID: 1ZHZ), which has also been employed in previous computational studies of Osh4–membrane interactions using all-atom simulations. , HMMM membranes with anionic lipid compositions were prepared to match the conditions of these prior studies. Simulations were performed under NPT conditions at 303.15 K, consistent with the experimental and computational setups reported previously. , Further details of the system setup and parameters are provided in the Supporting Information.

IBS Labels Determination from HMMM Simulations

The last simulation configuration of each trajectory for which the protein and HMMM membrane were in contact was used to define the IBS labels. Protein residues with heavy atoms within 0.5 nm of the heavy atoms of truncated lipids on the membrane surface were designated as IBS residues. Details on defining consensus and union IBS labels using trajectories of replica simulations are described in the Supporting Information.

Supplementary Material

ci5c02566_si_001.pdf (3.8MB, pdf)

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. CBET-2346683. This work used the Center for High Throughput Computing (CHTC, doi: 10.21231/GNT1-HW21).

Source data files for the simulations of PMPs with HMMM model systems, along with scripts used to generate MD-based IBS labels and the resulting label data, are available via the following Dryad link: https://doi.org/10.5061/dryad.1rn8pk175. The codes and scripts used to perform data preprocessing, develop models, perform the analyses, and generate results in this study are publicly available and have been deposited in https://github.com/byungukP/masif_pmp under an open MIT license.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c02566.

  • Additional information on MaSIF-PMP parameters and surface feature calculations, additional validation tests and comparisons, and details on simulation methods and parameters; Supporting figures; Supporting tables (PDF)

The authors declare no competing financial interest.

References

  1. Leventis P. A., Grinstein S.. The distribution and function of phosphatidylserine in cellular membranes. Annual review of biophysics. 2010;39:407–427. doi: 10.1146/annurev.biophys.093008.131234. [DOI] [PubMed] [Google Scholar]
  2. Cho W., Stahelin R. V.. Membrane-protein interactions in cell signaling and membrane trafficking. Annu. Rev. Biophys. Biomol. Struct. 2005;34:119–151. doi: 10.1146/annurev.biophys.33.110502.133337. [DOI] [PubMed] [Google Scholar]
  3. Stahelin R. V.. Lipid binding domains: more than simple lipid effectors. Journal of lipid research. 2009;50:S299–S304. doi: 10.1194/jlr.R800078-JLR200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Seaton, B. A. ; Roberts, M. F. . Peripheral membrane proteins. Biological Membranes: A Molecular Perspective from Computation and Experiment; Springer, 1996; pp 355–403. [Google Scholar]
  5. Mulgrew-Nesbitt A., Diraviyam K., Wang J., Singh S., Murray P., Li Z., Rogers L., Mirkovic N., Murray D.. The role of electrostatics in protein–membrane interactions. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 2006;1761(8):812–826. doi: 10.1016/j.bbalip.2006.07.002. [DOI] [PubMed] [Google Scholar]
  6. Labrecque C. L., Fuglestad B.. Ligandability at the membrane interface of GPx4 revealed through a reverse micelle fragment screening platform. JACS Au. 2024;4(7):2676–2686. doi: 10.1021/jacsau.4c00427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Boes D. M., Godoy-Hernandez A., McMillan D. G. G.. Peripheral membrane proteins: promising therapeutic targets across domains of life. Membranes. 2021;11(5):346. doi: 10.3390/membranes11050346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cournia Z., Chatzigoulas A.. Allostery in membrane proteins. Curr. Opin. Struct. Biol. 2020;62:197–204. doi: 10.1016/j.sbi.2020.03.006. [DOI] [PubMed] [Google Scholar]
  9. Segers K., Sperandio O., Sack M., Fischer R., Miteva M. A., Rosing J., Nicolaes G. A. F., Villoutreix B. O.. Design of protein–membrane interaction inhibitors by virtual ligand screening, proof of concept with the C2 domain of factor V. Proc. Natl. Acad. Sci. U. S. A. 2007;104(31):12697–12702. doi: 10.1073/pnas.0701051104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Spiegel P. C., Kaiser S. M., Simon J. A., Stoddard B. L.. Disruption of protein-membrane binding and identification of small-molecule inhibitors of coagulation factor VIII. Chemistry & biology. 2004;11(10):1413–1422. doi: 10.1016/j.chembiol.2004.08.006. [DOI] [PubMed] [Google Scholar]
  11. Chen L., Du-Cuny L., Moses S., Dumas S., Song Z., Rezaeian A. H., Lin H.-K., Meuillet E. J., Zhang S.. Novel inhibitors induce large conformational changes of GAB1 pleckstrin homology domain and kill breast cancer cells. PLoS computational biology. 2015;11(1):e1004021. doi: 10.1371/journal.pcbi.1004021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Li Z., Buck M.. Computational design of myristoylated cell-penetrating peptides targeting oncogenic K-Ras. G12D at the effector-binding membrane interface. J. Chem. Inf. Model. 2020;60(1):306–315. doi: 10.1021/acs.jcim.9b00690. [DOI] [PubMed] [Google Scholar]
  13. Liu Z., Lin L., Yuan C., Nicolaes G. A. F., Chen L., Meehan E. J., Furie B., Furie B., Huang M.. Trp2313-His2315 of factor VIII C2 domain is involved in membrane binding: structure of a complex between the C2 domain and an inhibitor of membrane binding. J. Biol. Chem. 2010;285(12):8824–8829. doi: 10.1074/jbc.M109.080168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Nawrotek A., Benabdi S., Niyomchon S., Kryszke M.-H., Ginestier C., Cañeque T., Tepshi L., Mariani A., St. Onge R. P., Giaever G.. et al. PH-domain-binding inhibitors of nucleotide exchange factor BRAG2 disrupt Arf GTPase signaling. Nat. Chem. Biol. 2019;15(4):358–366. doi: 10.1038/s41589-019-0228-3. [DOI] [PubMed] [Google Scholar]
  15. Nicolaes G. A. F., Kulharia M., Voorberg J., Kaijen P. H., Wroblewska A., Wielders S., Schrijver R., Sperandio O., Villoutreix B. O.. Rational design of small molecules targeting the C2 domain of coagulation factor VIII. Blood, The Journal of the American Society of Hematology. 2014;123(1):113–120. doi: 10.1182/blood-2013-05-503227. [DOI] [PubMed] [Google Scholar]
  16. Tubiana T., Sillitoe I., Orengo C., Reuter N.. Dissecting peripheral protein-membrane interfaces. PLOS Computational Biology. 2022;18(12):e1010346. doi: 10.1371/journal.pcbi.1010346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cannon K. S., Sarsam R. D., Tedamrongwanish T., Zhang K., Baker R. W.. Lipid nanodiscs as a template for high-resolution cryo-EM structures of peripheral membrane proteins. J. Struct. Biol. 2023;215(3):107989. doi: 10.1016/j.jsb.2023.107989. [DOI] [PubMed] [Google Scholar]
  18. Corey R. A., Stansfeld P. J., Sansom M. S. P.. The energetics of protein–lipid interactions as viewed by molecular simulations. Biochem. Soc. Trans. 2020;48(1):25–37. doi: 10.1042/BST20190149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Larsen A. H., John L. H., Sansom M. S. P., Corey R. A.. Specific interactions of peripheral membrane proteins with lipids: what can molecular simulations show us? Biosci Rep. 2022;42(4):na. doi: 10.1042/BSR20211406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Monje-Galvan V., Klauda J. B.. Peripheral membrane proteins: Tying the knot between experiment and computation. Biochim. Biophys. Acta. 2016;1858(7):1584–1593. doi: 10.1016/j.bbamem.2016.02.018. [DOI] [PubMed] [Google Scholar]
  21. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A. J., Bambrick J.. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Krishna R., Wang J., Ahern W., Sturmfels P., Venkatesh P., Kalvet I., Lee G. R., Morey-Burrows F. S., Anishchenko I., Humphreys I. R.. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024;384(6693):eadl2528. doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
  23. Passaro S., Corso G., Wohlwend J., Reveiz M., Thaler S., Somnath V. R., Getz N., Portnoi T., Roy J., Stark H.. Boltz-2: Towards accurate and efficient binding affinity prediction. BioRxiv. 2025:2025–2006. [Google Scholar]
  24. Chai Discovery t., Boitreaud J., Dent J., McPartlon M., Meier J., Reis V., Rogozhonikov A., Wu K.. Chai-1: Decoding the molecular interactions of life. BioRxiv. 2024:2024–2010. [Google Scholar]
  25. Hashemifar S., Neyshabur B., Khan A. A., Xu J.. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics. 2018;34(17):i802–i810. doi: 10.1093/bioinformatics/bty573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Li Y., Golding G. B., Ilie L.. DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics. 2021;37(7):896–904. doi: 10.1093/bioinformatics/btaa750. [DOI] [PubMed] [Google Scholar]
  27. Gao Z., Jiang C., Zhang J., Jiang X., Li L., Zhao P., Yang H., Huang Y., Li J.. Hierarchical graph learning for protein–protein interaction. Nat. Commun. 2023;14(1):1093. doi: 10.1038/s41467-023-36736-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Jha K., Saha S., Singh H.. Prediction of protein–protein interaction using graph neural networks. Sci. Rep. 2022;12(1):8360. doi: 10.1038/s41598-022-12201-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Tubiana J., Schneidman-Duhovny D., Wolfson H. J.. ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nat. Methods. 2022;19(6):730–739. doi: 10.1038/s41592-022-01490-7. [DOI] [PubMed] [Google Scholar]
  30. Li P., Liu Z.-P.. GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning. Nucleic acids research. 2023;51(10):e60–e60. doi: 10.1093/nar/gkad288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhang Y., Qiao S., Ji S., Li Y.. DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding. International Journal of Machine Learning and Cybernetics. 2020;11(4):841–851. doi: 10.1007/s13042-019-00990-x. [DOI] [Google Scholar]
  32. Sun L., Xu K., Huang W., Yang Y. T., Li P., Tang L., Xiong T., Zhang Q. C.. Predicting dynamic cellular protein–RNA interactions by deep learning using in vivo RNA structures. Cell research. 2021;31(5):495–516. doi: 10.1038/s41422-021-00476-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Baek M., McHugh R., Anishchenko I., Jiang H., Baker D., DiMaio F.. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods. 2024;21(1):117–121. doi: 10.1038/s41592-023-02086-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Dauparas J., Lee G. R., Pecoraro R., An L., Anishchenko I., Glasscock C., Baker D.. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods. 2025;22:717. doi: 10.1038/s41592-025-02626-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. van Hilten N., Verwei N., Methorst J., Nase C., Bernatavicius A., Risselada H. J.. PMIpred: a physics-informed web server for quantitative protein–membrane interaction prediction. Bioinformatics. 2024;40(2):btae069. doi: 10.1093/bioinformatics/btae069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Chatzigoulas A., Cournia Z.. Predicting protein–membrane interfaces of peripheral membrane proteins using ensemble machine learning. Briefings in Bioinformatics. 2022;23(2):na. doi: 10.1093/bib/bbab518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lomize A. L., Todd S. C., Pogozheva I. D.. Spatial arrangement of proteins in planar and curved membranes by PPM 3.0. Protein Sci. 2022;31(1):209–220. doi: 10.1002/pro.4219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kufareva I., Lenoir M., Dancea F., Sridhar P., Raush E., Bissig C., Gruenberg J., Abagyan R., Overduin M.. Discovery of novel membrane binding structures and functions. Biochemistry and Cell Biology. 2014;92(6):555–563. doi: 10.1139/bcb-2014-0074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Gainza P., Sverrisson F., Monti F., Rodolà E., Boscaini D., Bronstein M. M., Correia B. E.. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods. 2020;17(2):184–192. doi: 10.1038/s41592-019-0666-6. [DOI] [PubMed] [Google Scholar]
  40. Gainza P., Wehrle S., Van Hall-Beauvais A., Marchand A., Scheck A., Harteveld Z., Buckley S., Ni D., Tan S., Sverrisson F.. et al. De novo design of protein interactions with learned surface fingerprints. Nature. 2023;617(7959):176–184. doi: 10.1038/s41586-023-05993-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Marchand A., Buckley S., Schneuing A., Pacesa M., Elia M., Gainza P., Elizarova E., Neeser R. M., Lee P.-W., Reymond L.. et al. Targeting protein–ligand neosurfaces with a generalizable deep learning tool. Nature. 2025;639(8054):522–531. doi: 10.1038/s41586-024-08435-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Petzold G., Gainza P., Annunziato S., Lamberto I., Trenh P., McAllister L. A., DeMarco B., Schwander L., Bunker R. D., Zlotosch M.. et al. Mining the CRBN target space redefines rules for molecular glue–induced neosubstrate recognition. Science. 2025;389(6755):eadt6736. doi: 10.1126/science.adt6736. [DOI] [PubMed] [Google Scholar]
  43. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N., Bourne P. E.. The protein data bank. Nucleic acids research. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Fuglebakk E., Reuter N.. A model for hydrophobic protrusions on peripheral membrane proteins. PLoS computational biology. 2018;14(7):e1006325. doi: 10.1371/journal.pcbi.1006325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Arcario M. J., Ohkubo Y. Z., Tajkhorshid E.. Capturing spontaneous partitioning of peripheral proteins using a biphasic membrane-mimetic model. J. Phys. Chem. B. 2011;115(21):7029–7037. doi: 10.1021/jp109631y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Rogaski B., Klauda J. B.. Membrane-binding mechanism of a peripheral membrane protein through microsecond molecular dynamics simulations. Journal of molecular biology. 2012;423(5):847–861. doi: 10.1016/j.jmb.2012.08.015. [DOI] [PubMed] [Google Scholar]
  47. Koivuniemi A., Vuorela T., Kovanen P. T., Vattulainen I., Hyvönen M. T.. Lipid exchange mechanism of the cholesteryl ester transfer protein clarified by atomistic and coarse-grained simulations. PLoS computational biology. 2012;8(1):e1002299. doi: 10.1371/journal.pcbi.1002299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lamprakis C., Stocker A., Cascella M.. Mechanisms of recognition and binding of α-TTP to the plasma membrane by multi-scale molecular dynamics simulations. Frontiers in molecular biosciences. 2015;2:36. doi: 10.3389/fmolb.2015.00036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Grabon A., Orłowski A., Tripathi A., Vuorio J., Javanainen M., Róg T., Lönnfors M., McDermott M. I., Siebert G., Somerharju P.. et al. Dynamics and energetics of the mammalian phosphatidylinositol transfer protein phospholipid exchange cycle. J. Biol. Chem. 2017;292(35):14438–14455. doi: 10.1074/jbc.M117.791467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Galassi V. V., Villarreal M. A., Posada V., Montich G. G.. Interactions of the fatty acid-binding protein ReP1-NCXSQ with lipid membranes. Influence of the membrane electric field on binding and orientation. Biochimica et Biophysica Acta (BBA)-Biomembranes. 2014;1838(3):910–920. doi: 10.1016/j.bbamem.2013.11.008. [DOI] [PubMed] [Google Scholar]
  51. Ohkubo Y. Z., Pogorelov T. V., Arcario M. J., Christensen G. A., Tajkhorshid E.. Accelerating membrane insertion of peripheral proteins with a novel membrane mimetic model. Biophys. J. 2012;102(9):2130–2139. doi: 10.1016/j.bpj.2012.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Arita M., Nomura K., Arai H., Inoue K.. α-Tocopherol transfer protein stimulates the secretion of α-tocopherol from a cultured liver cell line through a brefeldin A-insensitive pathway. Proc. Natl. Acad. Sci. U. S. A. 1997;94(23):12437–12441. doi: 10.1073/pnas.94.23.12437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Horiguchi M., Arita M., Kaempf-Rotzoll D. E., Tsujimoto M., Inoue K., Arai H.. pH-dependent translocation of α-tocopherol transfer protein (α-TTP) between hepatic cytosol and late endosomes. Genes to Cells. 2003;8(10):789–800. doi: 10.1046/j.1365-2443.2003.00676.x. [DOI] [PubMed] [Google Scholar]
  54. Qi Y., Cheng X., Lee J., Vermaas J. V., Pogorelov T. V., Tajkhorshid E., Park S., Klauda J. B., Im W.. CHARMM-GUI HMMM builder for membrane simulations with the highly mobile membrane-mimetic model. Biophysical journal. 2015;109(10):2012–2022. doi: 10.1016/j.bpj.2015.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Im Y. J., Raychaudhuri S., Prinz W. A., Hurley J. H.. Structural mechanism for sterol sensing and transport by OSBP-related proteins. Nature. 2005;437(7055):154–158. doi: 10.1038/nature03923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Mim C., Unger V. M.. Membrane curvature and its generation by BAR proteins. Trends in biochemical sciences. 2012;37(12):526–533. doi: 10.1016/j.tibs.2012.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L.. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems. 2019;32:na. [Google Scholar]
  58. Abadi, M. ; Barham, P. ; Chen, J. ; Chen, Z. ; Davis, A. ; Dean, J. ; Devin, M. ; Ghemawat, S. ; Irving, G. ; Isard, M. . TensorFlow: a system for large-scale machine learning. 2016; pp 265–283.
  59. Sanner M. F., Olson A. J., Spehner J. C.. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996;38(3):305–320. doi: 10.1002/(SICI)1097-0282(199603)38:3<305::AID-BIP4>3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
  60. Zhou, Q. Pymeshgeometry processing library for python. 2019, 7. Software available for download at https://github.com/PyMesh/PyMesh. [Google Scholar]
  61. Yin S., Proctor E. A., Lugovskoy A. A., Dokholyan N. V.. Fast screening of protein surfaces using geometric invariant fingerprints. Proc. Natl. Acad. Sci. U. S. A. 2009;106(39):16622–16626. doi: 10.1073/pnas.0906146106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Dijkstra E. W.. A note on two problems in connexion with graphs. Numerishe Mathematik 1. 1959;1:269. doi: 10.1007/BF01386390. [DOI] [Google Scholar]
  63. Borg, I. ; Groenen, P. J. F. . Modern multidimensional scaling: Theory and applications; Springer, 2005. [Google Scholar]
  64. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V.. Scikit-learn: Machine learning in Python. Journal of machine Learning research. 2011;12:2825–2830. [Google Scholar]
  65. Koenderink J. J., Van Doorn A. J.. Surface shape and curvature scales. Image and vision computing. 1992;10(8):557–564. doi: 10.1016/0262-8856(92)90076-F. [DOI] [Google Scholar]
  66. Kyte J., Doolittle R. F.. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982;157(1):105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  67. Jurrus E., Engel D., Star K., Monson K., Brandi J., Felberg L. E., Brookes D. H., Wilson L., Chen J., Liles K.. et al. Improvements to the APBS biomolecular solvation software suite. Protein science. 2018;27(1):112–128. doi: 10.1002/pro.3280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Kortemme T., Morozov A. V., Baker D.. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes. J. Mol. Biol. 2003;326(4):1239–1259. doi: 10.1016/S0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  69. Morozov A. V., Kortemme T.. Potential functions for hydrogen bonds in protein structure prediction and design. Advances in protein chemistry. 2005;72:1–38. doi: 10.1016/S0065-3233(05)72001-5. [DOI] [PubMed] [Google Scholar]
  70. Krizhevsky A., Sutskever I., Hinton G. E.. ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems. 2017;60:84. doi: 10.1145/3065386. [DOI] [Google Scholar]
  71. Monti F., Boscaini D., Masci J., Rodola E., Svoboda J., Bronstein M. M.. Geometric deep learning on graphs and manifolds using mixture model CNNs. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017;2017:5115–5124. [Google Scholar]
  72. Kingma D. P., Ba J.. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014:na. [Google Scholar]
  73. Best R. B., Zhu X., Shim J., Lopes P. E. M., Mittal J., Feig M., MacKerell A. D. Jr.. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone ϕ, ψ and side-chain χ1 and χ2 dihedral angles. J. Chem. Theory Comput. 2012;8(9):3257–3273. doi: 10.1021/ct300400x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Klauda J. B., Venable R. M., Freites J. A., O’Connor J. W., Tobias D. J., Mondragon-Ramirez C., Vorobyov I., MacKerell A. D. Jr., Pastor R. W.. Update of the CHARMM all-atom additive force field for lipids: Validation on six lipid types. J. Phys. Chem. B. 2010;114(23):7830–7843. doi: 10.1021/jp101759q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Huang J., Rauscher S., Nawrocki G., Ran T., Feig M., De Groot B. L., Grubmüller H., MacKerell A. D. Jr. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14(1):71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci5c02566_si_001.pdf (3.8MB, pdf)

Data Availability Statement

Source data files for the simulations of PMPs with HMMM model systems, along with scripts used to generate MD-based IBS labels and the resulting label data, are available via the following Dryad link: https://doi.org/10.5061/dryad.1rn8pk175. The codes and scripts used to perform data preprocessing, develop models, perform the analyses, and generate results in this study are publicly available and have been deposited in https://github.com/byungukP/masif_pmp under an open MIT license.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES