Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2024 Mar 19;25(2):bbae105. doi: 10.1093/bib/bbae105

GPCR-IPL score: multilevel featurization of GPCR–ligand interaction patterns and prediction of ligand functions from selectivity to biased activation

Surendra Kumar 1, Mahesh K Teli 2, Mi-hyun Kim 3,
PMCID: PMC10959162  PMID: 38517694

Abstract

G-protein-coupled receptors (GPCRs) mediate diverse cell signaling cascades after recognizing extracellular ligands. Despite the successful history of known GPCR drugs, a lack of mechanistic insight into GPCR challenges both the deorphanization of some GPCRs and optimization of the structure–activity relationship of their ligands. Notably, replacing a small substituent on a GPCR ligand can significantly alter extracellular GPCR–ligand interaction patterns and motion of transmembrane helices in turn to occur post-binding events of the ligand. In this study, we designed 3D multilevel features to describe the extracellular interaction patterns. Subsequently, these 3D features were utilized to predict the post-binding events that result from conformational dynamics from the extracellular to intracellular areas. To understand the adaptability of GPCR ligands, we collected the conformational information of flexible residues during binding and performed molecular featurization on a broad range of GPCR–ligand complexes. As a result, we developed GPCR–ligand interaction patterns, binding pockets, and ligand features as score (GPCR-IPL score) for predicting the functional selectivity of GPCR ligands (agonism versus antagonism), using the multilevel features of (1) zoomed-out ‘residue level’ (for flexible transmembrane helices of GPCRs), (2) zoomed-in ‘pocket level’ (for sophisticated mode of action) and (3) ‘atom level’ (for the conformational adaptability of GPCR ligands). GPCR-IPL score demonstrated reliable performance, achieving area under the receiver operating characteristic of 0.938 and area under the precision-recall curve of 0.907 (available in gpcr-ipl-score.onrender.com). Furthermore, we used the molecular features to predict the biased activation of downstream signaling (Gi/o, Gq/11, Gs and β-arrestin) as well as the functional selectivity. The resulting models are interpreted and applied to out-of-set validation with three scenarios including the identification of a new MRGPRX antagonist.

Keywords: molecular featurization, biased activation, functional selectivity, G-protein-coupled receptor, machine learning, protein–ligand complex interaction patterns

INTRODUCTION

G-protein-coupled receptors (GPCRs) regulate a wide range of intracellular signaling cascades in response to hormones, neurotransmitters, ions, photons, odorants and other stimuli [1]. Consequently, GPCR-binding drugs can induce, block or modulate signaling cascades, making the GPCR family the most pharmacologically important target among integral membrane proteins (~30% of FDA-approved drugs) [2]. The human genome contains over 800 GPCR family members, which regulate physiological processes such as cognition, mood, blood pressure regulation, smell, behavior and taste. GPCR proteins are important therapeutic targets for the treatment of numerous disorders [3–8]. Computational simulations have contributed to GPCR drug discovery over recent decades such as molecular dynamics (MD) simulations, docking simulations and methodologies of structure-based drug design [9–13]. Despite the successful history of GPCR drug discovery, researchers still lack the necessary mechanistic understanding at the molecular level to deorphanize some GPCRs and to optimize the structure–activity relationship (SAR) of GPCR ligands. For molecular insight, GPCRs are structurally grouped into six classes for analysis: Class A (rhodopsin), Class B (secretin, adhesion), Class C (glutamate), Class D (fungal mating pheromone), Class E (cyclic adenosine monophosphate) and Class F (frizzled and smoothened receptors) [14]. GPCRs share a common structural organization with 7-TM helices (TM1–TM7) and are connected by three extracellular loops (ECL1–ECL3) and three intracellular loops (ICL1–ICL3) [15]. Ligand recognition sites are located in the ECLs region (N-terminus), and upon ligand binding, 7-TM undergoes conformational changes, thereby transducing signals from the extracellular to the intracellular region [16]. Notably, the literature reports that the multiple conformations of GPCRs and their dynamics are responsible for the ability of activated GPCRs to regulate signaling pathways. GPCR ligands can modulate conformational dynamics in the classic endogenous activation paradigm [1, 4]. In other words, ligands can regulate GPCR activity and function by conformational selection of distinct states [1].

In general, conformational dynamics cannot be directly observed from structures (X-ray or Cryo-EM) deposited in the RCSB/GPCRdb server. Therefore, collaborative research using crystallography, spectroscopy and computer simulations have been conducted to study fluctuations across (or within) GPCR conformational states. MD simulations are preferred to accurately predict the conformational states of an individual GPCR or to elucidate ligand association and dissociation from binding pockets [17]. Furthermore, ligand selectivity at the atomic level can be explained using MD simulations. For example, to demonstrate the activation of molecular switches, the contrasting agonistic and antagonistic actions of two diastereomers (of one chiral ligand) against the 5-HT1A receptor were simulated [18]. Similarly, MD simulations with increased sampling could decipher the different selectivity of ligands against A2a and A1 adenosine receptors resulting from the small substituents (-OH) of the ligands [19]. Meanwhile, the computational costs of MD simulations prevent rapid prediction of a wide range of GPCRs and their universal comparison, especially, ligand selectivity. To improve general predictions, machine learning (ML) methods have been used together with MD simulations or have replaced atomic-level simulations. For example, Kooistra et al. [20] proposed a protein–ligand interaction fingerprints-based classification model to predict partial or full agonist, antagonists or inverse agonist and decoys from only β1R or β2R receptor-ligand complexes (31 PDBs). Recently, Nicoli et al. [21] reported ECL-2-based clustering of Class A GPCRs by analyzing the shape, volume and conformations of MD-simulated trajectories. Although the description of conformational differences and intramolecular interactions provides valuable insights, it does not directly return any predicted results. Recently, Xie et al. [22] reported a predictive model for ligand selectivity based on protein sequences and 2D chemical structures without any 3D information. A bi-class prediction of ligand-biased activation (G-protein/arrestin) was published by Sirimulla et al. [23]. Few large-scale experimental and computational studies have attempted to predict post-binding events—specific function or pharmacological behaviors that occur after ligand binding—even though advanced ML algorithms and featurization have been employed in several studies to predict drug–target interactions (DTI ) [2, 17, 24, 25] and binding affinities for GPCR ligands [26].

Meanwhile, when we consider the above-mentioned predictions for GPCR drug discovery as a part of DTI models, the history of reported DTI models is quite diverse in several research fields. While bipartite drug–target networks were generated using versatile systematic or integrated data as a conventional DTI model in computational biology [27], DTI models of cheminformatics was built using atomic level data of ligands (for ligand-based model) [28] or protein–ligand complexes (for structure-based model) [29]. The priori has been valuable for understanding drugs’ effects on the whole system (especially drug repositioning from one target to anther) [30] and the later has been practically used for drug design and virtual screening [28, 29]. Moreover, DTI models can be grouped by their features: (1) chemo (ligand)-centric or protein (target)-centric and (2) 2D feature such as protein sequence and molecular graph (2D structure formula) or 3D feature such as conformation of proteins or ligands. Recent advance in both open-source ML libraries and availability of data source can allow diverse handcrafted features and automated-extraction features [31]. Generally, while a conventional DTI model is a classification model able to predict diverse nominal labels, recent studies describe that drug–target affinity (DTA) model is a kind of DTI model for predicting binding affinity of drug–target complexes [32]. For example, Cheng group’s review article on DTI models described three different DTI models based on the type of label information, which should be predicted: (1) protein–ligand binding sites, (2) ligand-binding affinity and (3) binding pose [32]. According to the label, required data and features are different. The review also appealed the importance of interconnection of separately discussed models. Figure 1 summarizes the known DTI models along with these factors and emphasizes this study focused on post-binding events using 3D molecular feature.

Figure 1.

Figure 1

Overview of DTI prediction.

Predicting post-binding events can provide valuable insights into clinical outcomes. Notably, clinical outcomes rely heavily on changes in receptor activity, particularly the functional selectivity of the ligand after ligand binding [22]. Furthermore, to the best of our knowledge, existing computational studies cannot directly guide the design of new GPCR drugs that can enhance or suppress specific signaling pathways. Replacing a substituent on GPCR ligands can substantially change their mode of action (e.g. modified target specificity or an agonism/antagonism switch). Thus, molecular features should be directly applicable to the SAR profiling of GPCR ligands. In this study, we examined the molecular characteristics of GPCR–ligand 3D complexes and predicted the functions of a wide range of GPCR ligands based on these features. For this purpose, we defined the ideal molecular features for GPCR drug discovery as the ability to explain these three characteristics.

  • (i) Flexibility of transmembrane helices at the residue level.

  • (ii) Mode of action of GPCR ligands at the pocket level.

  • (iii) Conformational adaptability of the ligands at the atom level.

3D-molecular featurization

We extracted valuable structural properties from the GPCR–ligand 3D complexes to obtain their ideal features. Notably, we attempted multilevel featurization by considering (1) a zoomed-out description of transmembrane helical motion, (2) a zoomed-in description of binding pockets, (3) 3D geometry and (4) noncovalent interaction (NCI) vectors. Our initial trial focused on the high conformational plasticity of GPCRs (active, intermediate and inactive). When an extracellular ligand binds to a GPCR, it changes its shape (conformational state) and subsequently binds to intracellular proteins such as arrestin, G-protein and GPCR kinase, initiating signal transduction. Rearrangement of transmembrane helices (TM5–7) is known to be particularly important in signal transmission [2, 25, 33]. The molecular motion of GPCR TMs must therefore be captured. For this purpose, we allocated generic sequence numbers to the 3D GPCR–ligand complexes, and the NCIs of each complex were indexed to the generic number and interaction type (TM7–7.39_Aromatic_Edge or Face). This indexing method proved effective for the NCI comparison between different GPCRs because motifs crucial for GPCR activation (CWxP, NPxxY, DRY motifs and toggle switch W6 × 48) tend to be conserved and associated with generic numbers.

Second, we considered the possibility that GPCRs could bind to the same binding site yet exhibit different modes of action such as agonist, inverse agonist or antagonist. Although GPCRs activate multiple cellular pathways, a biased GPCR ligand can selectively stimulate particular pathways through its specific and unique binding mode. Therefore, we should capture the subtle differences, particularly the NCI pattern, between the binding modes and featurize them. Surly, the NCI pattern of helices can reflect conformational plasticity of the helices through the absence or presence of NCI between crucial motifs, which includes information about the distance (binominal type) between paired residues. Moreover, ligands varying in size from small neurotransmitters to large proteins can bind to GPCR-binding sites. As the result, the shapes of the binding sites and their positions differ between GPCRs. Thus, we need to describe both deep pockets of small molecules and the shallow extracellular binding sites of proteins/peptides. This required a zoomed-in approach as well as zoomed-out, ranging from the TM helices (for NCI patterns) to the binding cavities (for 3D geometry). We extracted the 3D-geometric coordinates of binding pockets using the FuzCav method developed by Weill et al. [34], which employs the pharmacophoric triplets of Cartesian coordinates of C-atoms. Briefly stated, the FuzCav [34] encodes a pocket or cavity with nominal distances divided into five ranges (0.0–4.8, 4.8–7.2, 7.2–9.5, 9.5–11.9 and 11.9–14.3 Å) and six pharmacophoric properties of each amino acid (H-bond donor, H-bond acceptor, aromatic, aliphatic, positive ionizable and negative ionizable). Furthermore, we considered the 3D shapes of GPCR ligands for featurization. A method for predicting SAR should explain the conformational adaptability of the ligands, particularly the functions resulting from modified ligand structures. While general quantitative SAR models are built from ligand feature(s) for a ‘specific’ target, we should benchmark 3D features of the models and quantify 3D bioactive conformations of GPCR ligands. For this purpose, the 3D fingerprint (E3FP) [35] of the ligands was used to capture the 3D molecular geometry. 3D atomic neighborhood patterns were captured regardless of the presence of covalent bonds between the atom pairs. Overall, in this study, we used conformation information from activated and inactivated GPCR complexes and mapped it into feature vectors using the four featurization logics. We conceptualized GPCR–ligand Interaction patterns, binding Pockets, and Ligand features as scores (GPCR-IPL score), consisting of the zoomed-out feature of NCI (INT_Feat), the zoomed-in feature of binding site geometry (POCK_Feat) based on triad residues and the bioactive conformation of ligands (LIG_Feat) as shown in Table 1. Figure 2 depicts the workflow of this study, which includes the steps for data set collection, feature engineering, model training, validation and evaluation with feature important analysis.

Table 1.

Feature description of GPCR-IPL score

Name Feature composition Bit size (sparse) Feature label
INT_Feat 7 Features Assigned to Each NCI 5439 Hydrophobic, HBond_Lig, HBond_Prot, Ionic_Lig, Ionic_Prot, Aromatic_Face/Face, Aromatic_Edge/Face
POCK_Feat 6 Features Assigned to Each Triad Residues with 5 Distance Ranges 4833 Aliphatic (Ap), Donor (Do), Acceptor (Ac), Aromatic (Ar), Positive (Po), Negative (Ne)
LIG_Feat 3D Geometry (Conformation) of Ligand 1024 The encoded bitvector of shells generated from covalent and non-covalent bonded atoms-

Figure 2.

Figure 2

The workflow for GPCR-IPL score.

METHODS AND MATERIALS

Data sets

The initial data sets were obtained from GPCRdb (574 3D complexes) [36] and GPCR-EXP (363 3D complexes) [37]. After merging the two data sets, we excluded apoproteins, complexes with large peptide ligands and duplicate PDB entries. Finally, we used 396 protein–ligand complexes for training and Tables S1S4 provide comprehensive description of the data set. Each 3D structure was annotated with specific tag information, including receptor family, class, species, methods, ligand type and function. The functionalities of the GPCR ligands were classified into five categories: agonist, partial agonist, antagonist, allosteric antagonist and inverse agonist. For the purpose of predicting the functional selectivity of ligands, we categorized agonists, partial agonists and allosteric agonists as Class0 (agonist class: 168 complexes). Conversely, antagonists, allosteric antagonists and inverse agonists were grouped into Class1 (antagonist class: 218 complexes). After the updating of GPCRdb, the additional 175 complexes were used for out-of-set validation. Moreover, another out-of-set validation recruited the complexes of 23 different Class A GPCRs in neither GPCRdb nor GPCR-EXP. Thirteen GPCRs had been incorporated from the RCSB Protein Data Bank (PDB; www.rcsb.org). On the other hand, PDB files for 10 GPCRs were obtained from the AlphaFold server (www.alphafold.ebi.ac.uk), highlighting the use of advanced computational protein structure predictions. In parallel, we collected 5886 ligands for the 23 different GPCRs from the ChEMBL, GLASS and IUPHAR/BPS Guide to Pharmacology databases (Tables S8 and S9). For the prediction of biased activation of ligands, 96 GPCR–ligand pairs were available from the total of 396 complexes. The activation of Gi/o (Gi1, Gi2, Gi3, Go, Gt1, Gt2 and Gt3) was assigned to Class0. The activation of Gq/11 (Gq and G11) and Gs (Gs) were assigned to Class1 and Class2, respectively.

Geometry optimization of protein–ligand complexes

The protein–ligand complexes were optimized using the Szybki of OpenEye Scientific Software [38]. For the protein, the default force-field parameters amberff14sb [38] was used and for ligand, AM1BCC [38] was used, with the flexibility of amino acid residues extended to 7.0 Å from the bound ligand. The Cartesian coordinates of the ligand were also adjustable during the optimization.

In silico data generation of typical class a GPCR–ligand complexes for out-of-set validation

Ligands corresponding to 13 typical Class A GPCR were obtained from ChEMBL, GLASS and IUPHAR/BPS Guide to Pharmacology databases based on the data quality. The Konstanz Information Miner was used to process punctual bioassay information of the data sets on direct binding and bioactivities. In order to generate 3D complexes of the ligands, each GPCR targets was assigned to a specific PDB structure (Table S5). The LigPrep was employed to generate the 3D structures of the ligands ensuring their optimal conformation before molecular docking. The protein structures were prepared using the Protein Preparation Wizard. Binding sites were defined based on bound ligand; however, for proteins from AlphaFold, the SiteMap tool was used. Molecular docking simulations of the 3D ligands to the PDBs were performed using the SP (Standard Precision) scoring function of Glide, which balances speed with accuracy (ACC) and is ideal for large-scale docking studies. Post-docking, molecules labeled as agonists, antagonists or inverse agonists were selected for further analysis. The resulting docked configurations of the ligand and protein complexes were subjected to feature generation, producing INT_Feat, POCK_Feat and LIG_Feat (Figure S1).

MD simulations of MRGPRX complex for newly designed GPCR drugs

Molecular docking simulations of in-house compounds with MRGPRX (PDB ID: 7S8O) [39] were performed using Schrodinger Suite 2020 [40]. The simulations were conducted under the condition of the protein grid size of 20 Å based on the centroids of amino acids verified through mutational studies, and using the Glide scoring function in extra precision mode. Then, MD simulations of the docked complexes were conducted using Desmond to assess protein–ligand interactions under solvated conditions [15]. Docked protein–ligand complexes were embedded in a POPC membrane model with TIP3P water molecules and neutralized with counterions at 0.15 M physiological salt content. Following minimization, the system was run for 100 ns in a constant number, pressure and temperature (constant surface tension, NPγT ensemble) ensemble at 300 K with a Nose–Hoover thermostat and Martyna–Tobias–Klein barostats set at 1.01 bar pressure. The stability of the protein–ligand interaction and underlying dynamic interactions were analyzed by monitoring the root mean square fluctuations, root mean square deviation, MM-GBSA, energy potential and intermolecular hydrogen bond interactions [41]. Finally, the frames with the lowest and highest MM-GBSA scores were chosen to evaluate the predictability (agonist or antagonist) of the ML models.

ML model architecture

Following the extraction of protein and ligand features as input feature matrices, ML models were constructed using a variety of conventional ML methods: linear, ensemble and neural-network. For our binary and multiclass prediction, the same set of ML methods was used and the performance metrics in each case were compared. The ML methods include Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Xtra-Gradient Boost Tree (XGB), Logistic Regression (LR), Gaussian Naïve Bayes (GNB), Multilayer Perceptron, Stochastic Gradient Descent (SGD), Light Gradient Boost (LGBM) and CatBoost. Unless otherwise specified, all learning models were developed using popular packages such as Scikit-learn [42], CatBoost [43] and LightGBM [44] and utilized default parameters.

Development and evaluation of the model

We performed a 10-fold cross-validation on the entire data set to assess the robustness of the performance of each ML model. The model was then built using stratified, randomized 80:20 splits for training and testing. The test set was used to evaluate the performance of the GPCR-IPL model after training on the training set. Binary and multiclass classification models were constructed to train and test the model and to predict each class. We report the following performance metrics to evaluate each model and class: ACC, Matthew’s correlation coefficient (MCC) [45], macro-averaged F1 score (F1) [46], area under the receiver operating characteristic (auROC) [47] and area under the precision-recall curve (auPR) [48].

RESULTS AND DISCUSSIONS

The generated GPCR-IPL feature sets were used in this investigation to predict the post-binding events of GPCR ligands. For this purpose, binary and multiclass classification models were generated using various conditions (the different feature sets, ML algorithms and geometry optimization). The predictive power and robustness of the predictive models verified the competence of the feature sets. Furthermore, we examined the manner in which the GPCR-IPL feature provided excellent ACC and identified the features that significantly contributed to the prediction. With three scenarios, we applied the GPCR-IPL to predict out-of-set data sets (a part of neither training nor test), and utilized it to the development of new MRGPRX antagonists. Because MRGPRX is atypical with less conserved key motifs than the others in Class A group, the successful development doubly confirmed the practical utility of GPCR-IPL, as evidenced by the out-of-set validation. Moreover, MRGPRX antagonists regulate histamine-independent itch pathways so that it paves the way for innovative pruritus therapeutics. Finally, GPCR-IPL was compared with state-of-the-art methods for GPCR drugs.

GPCR-IPL feature predicted the functional selectivity of GPCR ligands

The multilevel feature sets, namely, INT_Feat, POCK_Feat and LIG_Feat were evaluated either individually or in combination. If a feature set can characterize GPCR ligands (agonist/inverse/antagonist), their functions can also be predicted. Thus, we evaluated the respective feature sets based on their ability to function-wise categorize GPCR ligands. Classification models were built for this purpose using 10 distinct ML algorithms and statistically validated using 10-fold cross-validation of the training data and external validation of the test data. Using a confusion matrix, performance evaluation metrics, including the MCC, auROC, auPR and F1, were calculated for each model. Subsequently, the best model from each feature category, specifically from F01 to F07, was identified. A comparative analysis was then performed on the selected models (Figure 3). Despite a slight imbalance in the ratio between the two classes (Class 0 for agonists: 1.00; Class 1 for antagonists: 1.43), the statistical metrics consistently showed that the predictive ability of the models was reliable regardless of the ML algorithms or feature sets used (Table S5). For example, while the CatBoost classifier with all combined feature sets (INT_Feat + POCK_Feat + LIG_Feat) presented slightly superior performance compared with other binary classifiers (auPR = 0.860) in the internal validation, the SVM classifier with the same feature sets showed the comparable performance (auPR = 0.846) and the best performance (auPR = 0.891) in external validation. The generalizability of the predictive models was further validated by averaging the predictions 10-fold. The robustness and predictability of the models were maintained after external validation of the test data (Figure 4).

Figure 3.

Figure 3

The best predictive performance of GPCR-IPL feature sets. Abbreviations: F01: INT_Feat + POCK_Feat + LIG_Feat; F02: INT_Feat + POCK_Feat; F03: INT_Feat + LIG_Feat; F04: POCK_Feat + LIG_Feat; F05: INT_Feat; F06: POCK_Feat; F07: LIG_Feat.

Figure 4.

Figure 4

Compared auROC and auPR across diverse ML algorithms. Abbreviations: M01: SVM; M02: RF; M03: DT; M04: XGB; M05: LR; M06: GNB; M07: Multilayer Perceptron; M08: SGD; M09: LGBM; M10: CatBoost Classifier.

The improved performance of GPCR-IPL score and ablation study

The molecular interactions between a target and its ligand depend on the shape and electrostatic properties for both their binding site surfaces and ligand molecules [49]. In other words, conformational dynamics affect the interactions. The 3D structural data used in this study, obtained from RCSB/GPCRdb [36] and GPCR-EXP (Zhang Lab) [37] data sets, represent a part of the conformational dynamics. Based on the aforementioned findings, we were motivated to further explore the predictive ability rooted in molecular geometry. We were particularly interested in determining whether in situ optimization of molecular geometry can improve the predictive power. For this purpose, we attempted to optimize the molecular geometry of 3D GPCR–ligand complexes under the amberff14sb force field and AM1BCC charge (for ligands) using the Broyden–Fletcher–Goldfarb–Shanno algorithm to reach energy convergence [38]. Subsequently, the changes in shape and electrostatic information resulting from the optimized geometry newly filled values in the GPCR-IPL feature matrix. We named the set of GPCR-IPL features computed from the geometry-optimized complexes as ‘optimized GPCR features’. The optimized GPCR features demonstrated the superior performance to the non-optimized GPCR features in several feature sets including F01 and F02 (Figure 5 and Tables 2 and S6). Notably, all performance metrics of the Top2 predictive models revealed significant differences between non-optimized and optimized GPCR features (Figure 6).

Figure 5.

Figure 5

Compared performance metrics of Top2 predictive models between Non-Optimized and Optimized models. Abbreviations: NON_OPT: the non-optimized GPCR features from F02 and F03; OPT: the optimized GPCR features from F01 and F02.

Table 2.

Performance comparison of test set predictions between non-optimized and optimized GPCR features

Data sets Feature
sets
ML
method
MCC auROC auPR auPR F1 F1
Class0 Class1 Class0 Class1
Non-optimized (PDB) F01 CATB 0.821 0.912 0.872 0.840 0.905 0.910 0.900 0.920
F02 SGD 0.773 0.891 0.841 0.795 0.888 0.890 0.870 0.900
F03 XGB 0.663 0.830 0.774 0.728 0.820 0.830 0.800 0.860
F04 LR 0.797 0.901 0.857 0.817 0.897 0.900 0.880 0.910
F05 XGB 0.802 0.906 0.859 0.812 0.907 0.900 0.890 0.910
F06 RF 0.848 0.927 0.891 0.857 0.924 0.920 0.910 0.930
F07 XGB 0.663 0.830 0.774 0.728 0.820 0.830 0.800 0.860
Optimized geometry F01 LR 0.872 0.938 0.907 0.882 0.933 0.940 0.930 0.950
F02 LR 0.848 0.927 0.891 0.857 0.924 0.920 0.910 0.930
F03 RF 0.872 0.929 0.906 0.899 0.913 0.930 0.920 0.950
F04 CATB 0.819 0.907 0.871 0.846 0.896 0.910 0.890 0.930
F05 MLP 0.767 0.882 0.837 0.805 0.870 0.880 0.860 0.910
F06 LGB 0.779 0.895 0.845 0.791 0.898 0.890 0.870 0.900
F07 RF 0.794 0.897 0.855 0.822 0.887 0.900 0.880 0.910

Abbreviations: F01: INT_Feat + POCK_Feat + LIG_Feat; F02: INT_Feat + POCK_Feat; F03: INT_Feat + LIG_Feat; F04: POCK_Feat + LIG_Feat; F05: INT_Feat; F06: POCK_Feat; F07: LIG_Feat; XGB: Xtra-Gradient Boost Classifier; MLP: Multilayer Perceptron; LGB: Light Gradient Boost Machine Classifier; CATB: CatBoost Regressor.

Figure 6.

Figure 6

Bias-variance metrics for various Non-Optimized and Optimized GPCR-IPL features. The low bias and variance values suggest the absence of overfitting and underfitting.

The difference resulting from the optimized GPCR features were further examined through bias-variance decomposition analysis to ensure that neither overfitting nor underfitting occurred. Bias is defined as the difference between the expected value of the estimator (Inline graphic) and the parameter (θ) that we want to estimate:

graphic file with name DmEquation1.gif

Similarly, the variance is defined as the difference between the expected value of the squared estimator and the squared expectation of the estimator

graphic file with name DmEquation2.gif

According to the bias-variance tradeoff, ‘high variance’ is proportional to overfitting, whereas ‘high bias’ is proportional to underfitting [50]. Figure 6 shows the estimated bias-variance tradeoff for most models within a minimal range. The best model from the optimized F01 feature had an estimated bias-variance tradeoff of 0.062–0.057. Furthermore, the results also show that the other models were neither overfitted nor underfitted. The performance of the optimized GPCR features, as mentioned above clearly shows that the predictive power for the functional selectivity can be further improved through feature encoding, following in situ optimization or short/long-range MD simulations.

Thereafter, we performed an ablation study to show (1) whether or not GPCR-IPL score can retain the reliable performance despite the change of hyperparameters, (2) whether or not the chosen feature set F01 is the real best under randomly generated and (3) the doubly confirmation on the performance of Table 2. Figure 7A presented the ablation study on hyperparameters. F01 feature set was trained with 10-fold cross-validation (left blue) under randomly chosen 100 different hyperparameters of each ML methods, and compared with the original model (right orange) of geometry optimized F01 in Table S6. Hyperparameter configurations are available in Table S11. Then, the influence of a single hyperparameter was further conducted with their performance metrics (Figure 7B). These ablation results demonstrated that the predictive performance (of Table 1 and Figure 5) is robust and is not sensitive to hyperparameters modification. Moreover, the optimized features of Table 1 were trained using best ML methods for each feature sets under randomly chosen 100 different hyperparameters in the search space of Table S11. AuROC distribution of Figure 7C doubly confirmed that the F01 is the best feature and also verified the performance metric of Table 1.

Figure 7.

Figure 7

Ablation study on hyperparameters. (A) Performance of ML methods for F01 feature under randomly chosen hyperparameters, (B) performance of LR model of F01 across the change of single hyperparameter and (C) hyperparameter screening of each feature sets. Abbreviations: HM: Hyper-parameterized Model; OM: Original Model.

Characterization of key features for GPCR ligand selectivity

We had hoped to know not only ‘how well functional selectivity is predicted, but also why it is well predicted’. Thus, to understand the latter, the feature importance of the predictive models was studied. First, we used the SHapley Additive exPlanation (SHAP) [51] method able to explain any predictive model, regardless of its learning algorithm, such as masked deep learning. In particular, SHAP can provide insights into how to improve a predictive model and how to identify an important feature of the prediction by measuring the contribution of individual features. The best performing logistic model of the optimized GPCR-IPL (INT_Feat + POCK_Feat + LIG_Feat) was analyzed. Basically, the features imply interactive descriptions pertaining to the (1) interaction types of protein–drug pairs, (2) 3D location of respective triad residues and (3) quantified feature importance (SHAP value of features). In detail, SHAP values have a magnitude (size) and sign (negative or positive). A feature with a higher SHAP value contributes more to the prediction model. A positive SHAP value supports positive prediction, whereas a negative SHAP value rejects it. Ideally, if a GPCR–ligand complex had Top features of high positive SHAP values, it was predicted to be an antagonist (positive class). Similarly, if another complex lacked (+) SHAP features, it tends to function as an agonist. Notably, while the SHAP values of features vary among different datapoints to decide the predicted values, we summarized the varying SHAP values of Top20 features across the data set. The Top20 ranked features for both the training (Figure 8A) and test sets (Figure 8B) are described, and the common features are highlighted in the summary plots. Thus, the summary plots revealed feature importance pattern of the chosen model. According to the plots in Figure 8, POCK_Feat has the most dominant effect on the functional selectivity of the ligands, and INT_Feat cannot be ignored when predicting the ligand function. Although LIG_Feat presented the fewest number in the Top20, bit319 of LIG_Feat clearly showed the contribution with the Top5 or Top6 ranks.

Figure 8.

Figure 8

Top 20 important features of the best predictive model for the (A) training set and (B) testing set. Symbolized INT_Feat: ‘7.39x39_HYD’ means that generic residue No. 7.39 of TM7 has hydrophobic interaction with a ligand. Similarly, ‘20.20x20_HYD’ means that generic residue No. 20 of ECL has the hydrophobic interaction with a ligand. Symbolized POCK_Feat: bit number of the feature among triad residues. The top shared features in both training and testing are highlighted in bold.

Even though the plots highlight features critical for the functional selectivity of GPCR ligands, they still require further interpretation to provide structural insight in atomic level. Thus, we should visualize the distinguishing features of the complexes and use them to propose a drug design. The 5-hydroxytryptamine receptor (5-HT) is one such example, as it is involved in a wide variety of physiological functions, including cognition and emotion, vascular and smooth muscle contraction, platelet aggregation, gastrointestinal function and reproduction. Currently, 5-HT ligands are therapeutic candidates for migraine, depression and obesity. Meanwhile, two selectivity issues (subtype and functional selectivity) should be solved. Thus, we chose the 5-HT2B-Antagonist complex for further investigation. Among the five INT-Feat of Top20, Val3.33 (of TM3–3.33_Hyd feature) is adjacent to Asp3.32, a highly conserved residue in the orthosteric binding pocket of class A GPCRs. In response, the highest ‘negative’ SHAP value of TM3–3.33_Hyd clearly demonstrated that this feature has an agonistic function. In contrast, the highest ‘positive’ SHAP value of Val7.39 (of the TM7–7.39_Hyd feature) strongly suggests antagonist design (Figure 9A). The latter gives a quite different suggestion from Roth’s structural analysis [52] that compared methysergide (an antagonist) with methylergonovine (an agonist) (PDB: 6DRY versus 6DRZ). While the suggested key residues for the functional selectivity are ‘Ala5.46 and Thr3.37’ in the literature, GPCR-IPL suggested ‘Val7.39 together with Phe6.51 and Trp6.48’ (Figure 9B). In the sequence, POCK_Feat suggested triad residues (of poc_bit1643) surrounding the two N-methyl substituents (Figure 9C). Finally, our LIG-Feat emphasizes the retention of the circled fragment of the didehydroergoline core (lig_bit911) for antagonist design, whereas the ligand feature in the literature focuses on the difference of one N1-methyl substituent (Figure 9D).

Figure 9.

Figure 9

Model interpretation through 5-HT2B-Antagonist complex. (A) The waterfall plots to visualize the ranked important features (x-axis: SHAP value, y-axis: type of feature in the order of importance), (B) 3D location of Top5 features (hydrophobic interactions and their residues with generic numbers) in INT_Feat feature vector, (C) 3D location of ‘poc_bit1643’ in POCK_Feat feature vector and the identified triad residues and (D) Rendering of 3D geometry from LIG_Feat feature vector and the identified shell of ‘lig_bit911’.

GPCR-IPL features also predict the biased activation of GPCR ligands

The encouraging results of GPCR-IPL prompted us to further investigate the potential of our feature sets. Thus, we applied GPCR-IPL to examine the biased phenomenon of ligand activation. The Class A–F data sets were used to build classification models for predicting biased activation. The biased activation of GPCR ligands was classified into four types: Gi/o (Gi1, Gi2, Gi3, Go, Gz, Gt1, Gt2 and Ggust), Gq/11 (Gq, G11, G14 and G15), Gs (Gs and Golf) and β-arrestin [36]. Notably, the design of our model is distinct from those of known models using GPCR ligands. For example, BiasNet was trained using only the ligand structures, which included duplicated molecular representation of eight types of 2D molecular fingerprints and physicochemical descriptors, raising concerns about feature collinearity [23]. Our model was created from the multilevel feature sets of GPCR–ligand 3D complexes, with no duplication between different level features. More importantly, BiasNet only distinguished G-protein activation from β-arrestin activation. However, the signaling results caused by the different G-proteins were also different. In contrast to Gs activation, which stimulates a signal, Gi/o activation inhibits it. Therefore, we built multi-class models predicting different activations of Gi/o, Gq/11 and Gs (Figure 10A). The internal validation (using 10-folded cross-validation of training data) and external validation (of test data) verified their predictive ability, as evidenced by the statistical metrics (Figure 8B and Table S7). The ACC was >0.900 (external: 0.844), and the MCC was >0.823 (external: 0.738) in every feature set. The best feature set produced an auROC of 0.930 (external: 0.844) and auPR of 0.887 (external: 0.742). As shown in Figure 10B, Gi/o or Gs activation required different INT_Feat feature (of TM6–6.55_Hyd and TM6–6.48_Hyd, respectively). The 3D location of TM6–6.55_Hyd was visualized with the identified atom of E296.55 and the C5 carbon of the phenyl group in lasmiditan (Figure 10D). The two POCK_ Feats (poc_bit: 2924, 2490) were found to be the second most important features for Gi/o activation after the Top 1 INT_Feat. The summary of top20 features involved in training model and testing is shown in Figure S2.

Figure 10.

Figure 10

GPCR-IPL score for predicting the biased activation of GPCR ligands. (A) Data set composition of three classes (Gi/o, Gq/11 and Gs). (B) Predictive performance with statistical metrics. (C) Feature importance analysis of representative complexes: 5-HT1F and Gi/o, β3 -AR and Gs, M1R and Gq/11, respectively. (D) Binding site residues of the 5-HT1F and Gi/o complex identified from POCK_Feat (poc_bit:2924, 2490) and INT_Feat (TM_6–6.55_Hyd of E296.55), which contribute positively to Gi/o prediction. Abbreviations: F01: INT_Feat + POCK_Feat + LIG_Feat; F02: INT_Feat + POCK_Feat; F03: INT_Feat + LIG_Feat; F04: POCK_Feat + LIG_Feat.

Out-of-set validation and application of GPCR-ILP score with three scenarios

The promising results encouraged us to apply GPCR-IPL to practical drug discovery. Therefore, we considered it critical to expand the applicable scope of the GPCR-IPL from real (of known drugs) to in silico data (of newly designed or unprecedented drugs). The 3D complex structures utilized in GPCR-IPL encompass a more comprehensive array of molecular pharmacological data, providing deeper insights than what is possible with 2D data. In particular, they furnish in depth information surpassing that of chemical databases, which tend to include assay results of varying levels and often overlook essential details regarding functional selectivity. Nonetheless, because of the limited amount of high-quality 3D data available, a prediction model developed from this data requires thorough validation to ascertain its applicability across a diverse spectrum of examples. As a result, we used in silico generated data to further evaluate the performance of GPCR-IPL score in three scenarios.

Scenario 1

The GPCR-IPL score was applied to molecular docking simulations, pivotal in drug discovery. We extracted out-of-set data from public databases resulting in the 5886 GPCR ligands of 23 typical class A GPCRs (Table S8) with ligand information (assays for direct ligand binding and their bioactivities). This set included receptor like adenosine receptor B2, muscarinic receptor M2 (M2), 5-hydroxytrptamine receptor B1, C-X-C chemokine receptor type 4, Mu Opioid, Dopamine Receptor 2, Orexin Receptor 1, Histamine Receptor 3, Cannabinoid Receptor 1, Oxytocin receptor, Prostaglandin E (Types 2–4) and 10 AlphaFold target proteins such as Melanin-concentrating hormone receptor 1 and 2, Neuropeptides B/W receptor 1, Neuropeptide FF receptor 1, Neuropeptide S receptor, Oxoeicosanoid receptor 1, Prolactin Releasing Hormone Receptor, Prokineticin Receptor 1 and 2 and Urotensin 2 Receptor. The conformers of these ligands were prepared and docked to the respective GPCRs, as stated in the Materials and Methods section, and the functional selectivity of the docking poses was predicted using GPCR-IPL. Despite the virtual nature of the data, the model exhibited impressive predictive ACC, particularly highlighted by its ACC and F1 (Figures 11 and 12). Evaluations on ligands associated with AlphaFold’s targets further emphasized the model’s precision and adaptability to a range of protein structures, demonstrating consistent outcomes. This uniformity, especially with AlphaFold proteins, is a testament to the model’s robustness. Tables S8 and S9 detail the performance metrics for each target, reinforcing the GPCR-IPL score model’s ability to adeptly manage the complexities of varying ligands and protein targets. The model’s performance spectrum, from impeccable ACC in many targets to the average F1 of 0.584 in others, underscores both its potential and areas for further optimization in pose generation. For example, when we further optimized the molecular geometry of four targets in the out-of-set, the statistical metric was slightly improved (Figure 11B). The results demonstrate that the processing for better simulated data can improve the model’s predictive power. In order to further test more refined simulation data, we should shift from the large scale of Scenario 1 into a specific GPCR (of Scenario 3). Acknowledging the inherent challenges posed by certain targets, the substantial achievements of the model establish a strong foundation for its broad applications in molecular biology and pharmacology. We remain hopeful that obtaining real 3D data of the respective in silico complexes will further enhance the model’s performance.

Figure 11.

Figure 11

Out-of-set validation of GPCR-IPL score for in silico generated GPCR complexes. (A) Non-optimized 23 Class A GPCRs; (B) example of optimized Class A GPCRs.

Figure 12.

Figure 12

Representative complexes of in silico generated typical Class A GPCR data.

Scenario 2

During our research, we observed that additional GPCR data was updated in the GPCRdb. In response to the update, we adopted a dual approach: first, evaluating the existing model against an independent data set, and second, augmenting the data set with this new data to further evaluate the model’s predictive performance. Initially, the new experimental data of 175 protein–ligand complexes (Table S10) was treated as a novel, unseen out-of-set (neither training nor test data) to determine whether the original model, based on 396 protein–ligand pairs, could have its predictive performance. Regretfully, in the third-set evaluation, GPCR-IPL model attained the auROC value of 0.691, auPR of 0.570, ACC of 0.606 and F1 of 0.660 on this third set. Expectedly, the outcomes for the third set were lower than those of all models developed from the initial data set, across a range of features and ML methods. Considering the inherent difficulties in making predictions on new data, the decrease of the performance metrics is inevitable. We judge that the model has generalization capabilities along with future data update. After the first evaluation, the new data were integrated into the initial data set, with the expectation that the update would help evaluating the model’s robustness. This integration also facilitated a direct performance comparison between data sets. The classification model was developed using the updated data set, adhering to the conditions of the best-performing model from the initial data set, with no further feature optimization or ML method refinement. Despite the absence of any optimization process, the updated data presented robust predictive performance for the addition of dissimilar data to the initial data set. Every statistical metric (auROC: 0.855, auPR: 0.808, F1: 0.858) of the model derived from the updated data set resides within the performance spectrum of models constructed using the initial data set (Table 3). Thus, the metrics attributable to the updated data set are not substandard relative to those of the initial data set, demonstrated the robustness of GPCR-IPL model. The dual approach employing the additional GPCRdb data have ascertained that the GPCR-IPL, despite demonstrating constrained performance when applied to data quite different from the initial data set, concurrently exhibits potential for improvement underpinned by its robust performance.

Table 3.

Evaluation of model robustness according to data set update

Method Data set Validation ACC MCC F1 auPR auROC
F01/LR
optimized
Initial
GPCRdb
10-CV 0.922 0.841 0.934 0.908 0.917
External 0.938 0.872 0.940 0.907 0.938
Optimizeda Initial
GPCRdb
10-CV 0.867(0.068) 0.728 (0.142) 0.890 (0.054) 0.852 (0.064) 0.858 (0.073)
External 0.881(0.065) 0.757 (0.136) 0.878 (0.069) 0.835 (0.082) 0.879 (0.070)
F01/LR
optimized
Updated
GPCRdb
10-CV 0.905 0.811 0.894 0.856 0.903
External 0.861 0.720 0.858 0.808 0.855

Note: aMean (SD) of respective metrics in Table S6.

Scenario 3

With our enthusiasm, we attempted to apply GPCR-IPL to the antagonist design of Mas-related GPCRs (MRGPRs), which are non-typical GPCRs that differ considerably from other data set. MRGPRX receptors (human MRGPRs e.g.) can activate nearly all types of G-proteins and lack or contain modified versions of canonical trigger motifs for GPCR activation (e.g. PIF, CWxP and DRY motifs and toggle switch W6 × 48) [39]. Although cryo-EM structures of MRGPRX-agonist complexes [39] are available, rational design of novel antagonists is still challenging. Therefore, GPCR-IPL was applied to identify an unprecedented antagonist of the challenging GPCR. Cryo-EM structures [39] allowed the reliable assignment of generic numbering to generate INT_Feat. Subsequently, in silico data (MRGPRX-ligand complexes) were generated from our in-house library using molecular docking and MD simulations. We created the GPCR-IPL features (‘INT_Feat’, ‘POCK_Feat’ and ‘LIG_Feat’) from in silico data to predict the functional selectivity of the in-house compounds. We further ascertained the antagonistic nature of in-house compounds through feature engineering. This was followed by the in vitro validation. Fortunately, we identified KMH-45, an unprecedented MRGPRX2 antagonist and characterized it using the multilevel features. In detail, we compared the predictions of GPCR-IPL score with the energy trajectories of multiple conformational states (four jobs of 100 ns MD simulations: SIM01–SIM04) (Figure 13A). SHAP analysis of the complexes was compared with that of training data set to find which features are notable in the MRGPRX antagonism (Figure S3). Several Top features of the training set were counted for how much they exist in the 4000 MD frames (Figure 13B). Obviously, Top1 feature (7.39x39_Hyd) was present in all simulation and SIM03 showed the highest frequency. Top2 (6.51x51_Hyd) was present in the order of SIM03, SIM04 and SIM02. Top3 (of 3.33x33_Hyd) is exclusively present in SIM03. Top4 (6.48x48-Hyd) was absent in every frame with the negative SHAP values (agonistic). Following the frequency analysis of INT_Feat, we further examined the distance distributions of the INT_Feat (Figure 13C). Although the relative frequency of the features was quite different between simulations, the distance ranges of the hydrophobic interactions (3.2–4.5 Å) presented small deviation between simulations. Meanwhile, the replaced input data changed POCK_Feat (the zoomed-in feature) more than INT_Feat (zoomed-out feature). The disparate Top features between MD frames were noteworthy. Particularly, 3D location of representative POCK_Feat was well identified in the correctly predicted frames of the binding free energy plot (Figure 13D). The correct frames underscored the π-planer ring of the ligand (KMH-45) is tightly packed in triad residues of poc_bit1634. Surely, poc_bit1634 and poc_bit1695 emerged as Top6 and Top8, respectively, in frame with most favorable MM-GBSA energy (983th frame of SIM03). Conversely, incorrectly predicted frames highlighted the π-planer ring’s inclination toward the triad residues of either poc_bit3957 or poc_bit3140. Moreover, although the first frames (initial docking poses) of MD simulations also did not commonly share the Top features (poc_bit1634 and poc_bit1695), their POCK_Feat patterns were different with the incorrect frames (Figure 13D). Clearly, while Top POCK_Feat of every first frame are included in the Top30 list of training set (poc_bit864, poc_bit1329, poc_bit3916 and poc_bit3957 as well as poc_bit1634), some Top POCK_Feat of incorrectly predicted frames were absent in the list (poc_bit3140, poc_bit1257 and poc_bit2537). The different ‘pocket-level’ pattern between the first, correct and incorrect frame demonstrated the zoomed-in POCK_Feat can capture the subtle but critical motion of a GPCR binding pocket. In other words, GPCR-IPL is a competent score able to detect stable conformational state through the common POCK_Feat of multiple conformers. As the simulation progressed, the ligand experienced the conformational changes to optimize the interaction with the triad residues of Top positive POCK_Feat. The in vitro potency of KMH-45 was measured using a calcium flux assay (FLIPR) using Cortistatin-14 (a peptide agonist) for MRGPRX activation. The assay proved the MRGPRX2 antagonism of the KMH-45 with IC50 of 15.6 uM (Figure S4). Based on this case study, we conclude that our method facilitates GPCR drug design by identifying crucial 3D features based on the conformational dynamics of challenging GPCR.

Figure 13.

Figure 13

Application of GPCR-ILP into a simulated MRGPRX complex. (A) The binding free energy (MM-GBSA) plot of MD simulations started from four different docking poses (SIM01–4). The x-axis: MD frame number; y-axis: MM-GBSA (kcal/mol). For readability of each simulation, the baseline of MM-GBSA energy was adjusted individually. (B) Relative frequencies of representative features in 4000 MD frames. (C) Hydrophobic interaction distance distribution of representative INT-Feat in MD trajectories: 7.39x39_Hyd, Left; 3.33x33_Hyd, Middle; 6.51x51_Hyd, Right. The x-axis: MD simulation number; y-axis: distances (Å). (D) 3D location of representative POCK_Feat in the first frames and chosen correctly and incorrectly predicted frames.

Study scope, limitation and comparison with known predictive models of GPCR drug discovery

Prediction methods have been used to facilitate GPCR drug discovery thereby saving significant time and costs in laboratories. Many of these methods, including our prior research SMPLIP-Score [26] and KLD-DTI [53, 54], have focused on DTI or DTA prediction. Meanwhile, few studies have attempted to predict the functions that occur after GPCR–ligand binding. Moreover, while many models focus on prediction performance, model interpretability often remains unaddressed (the absence of ‘how to reach the prediction’). Thus, we tried to summarize recently reported GPCR models and to compare them with GPCR-IPL. We examined the (1) featurization methods, (2) model interpretation, (3) coverage of GPCR families and (4) prediction goal (Table 4). Even if functional selectivity (DeepReal) [22] and biased activation (BiasNET) [23] were predicted before GPCR-IPL score, DeepReal’s out-of-set validation for individual GPCRs was limited to opioids, and BiasNET could only distinguish between arrestin and G-proteins not to explain the dramatically different activations of G-protein subtypes (Table 4). Moreover, the featurization of all models was performed using only 2D features and it seems that some model needs the verification on feature collinearity [23, 56].

Table 4.

Comparative analysis of predictive models for GPCR drug discovery

Prediction Feature Interpret. GPCR
Cover.
AUC ACC F1 MCC Lit.
Functional selectivity Protein–Ligand 3D Complex
(INT_Feat/POCK_Feat)
3D Ligand Geometry (LIG_Feat)
SHAP Analysis;
3D-Location
Classes A–F 0.938 0.938 0.940 0.872 This study
Biased activation 0.930 0.900 0.922 0.823
Functional selectivity Protein Seq.,
2D Ligand Graph
(SMILES)
None Opioids of
Class A
0.910 0.660 Lei Xie et al. [22]/DeepREAL
DTI Protein Seq. (PSSM),
2D Ligand
(PubChem FP)
None Unclear 0.867 0.872 0.745 Wang et al. [55]
DTI 2D Ligand
(3 FP + 119 MolDes)
None Classes A–C 0.941 0.903 0.903 Xian-Qun Xie et al. [56]
Biased activation 2D Ligand
(8 FP + 208 MolDes)
None Unclear 0.841 0.842 0.769 0.651 Sirimulla et al. [23]/BiasNet

Abbreviations: Interpret.: Model Interpretation; GPCR Cover.: GPCR Coverage of the Model; AUC: Area Under the Curve; SMILES: Simplified molecular-input line-entry system; PSSM: Position-Specific Scoring Matrix; Protein Seq.: 2D sequence of each protein; 2D Ligand: 2D structure of each ligand; FP: Molecular Fingerprint; MolDes: Molecular Descriptors.

Any reported study did not directly interpret respective models. Meanwhile, GPCR-IPL score is solely explainable model through the descriptions of (1) quantitative feature importance, (2) feature type of target–drug pairs and (3) locational information of 3D features. Most notably, the significant role of the multilevel 3D information (especially, the 3D-locational information of Top features) is to provide the direct guidance for drug design. In detail, the 3D information can give suitable answers to the general questions for drug design: (1) at which position a substituent (side-chain) should be introduced, (2) how to create a synthetic scheme for a proposed scaffold according to the position of the substitution and (3) how to conduct patent analysis for respective model. For example, INT_Feat (e.g. Top1, 7.39x39_HYD) and POCK_Feat can suggest a position for the substituent in a GPCR scaffold (Figure 9A). POCK_Feat can give high confidence for correct poses of the scaffold and substituent. With the chosen substituent position, LIG_Feat (e.g. lig_bit911) can serve as hit filtering criterion for library screening (both focused and random library). Then, the chosen hit compounds (because of the completed structures) can be direct queries for creating a synthetic scheme for chemists. Similarly, the Top features can be directly used for whether or not a molecule suitable for 3D feature already exists in known patent. In contrast, 2D molecular features do not include the relative positional information between a GPCR and a ligand. Thus, the absence of alignment between scaffolds (or ligands) make it difficult to find a positional feature as a design guide for introducing a substituent. In particular, drug design from small or flexible scaffolds is impossible because of their high degree of freedom for alignment. Despite the merits in terms of interpretation and drug design, every 3D model including GPCR-IPL score and DTA models has the limitation in available data (the number of targets with known structures, identified binding sites and conformational motion) and inevitably needs simulated data for practical application. Fortunately, the advance in protein technology accelerates the generation rate of 3D data to improve the limited availability of 3D data. Moreover, when we consider 107 GPCRs and their corresponding 481 drugs are identified until 2017 [4], the predictive scope of GPCR-IPL scope covers 126 GPCRs of Classes A–F and it is comparable range with the scope of current GPCR drug discovery. Similarly, while 2D feature models can be compared with another 2D model using identical data sets, GPCR-IPL score is the first 3D model to predict post-binding events of GPCRs and does not have another 3D model for the performance comparison using an identical data.

CONCLUSIONS

This study designed 3D multilevel features to capture GPCR–ligand interaction patterns and applied the feature sets to predict post-binding events of GPCRs. The predictive performance of GPCR-IPL score was evaluated along with geometry optimization and ablation study. Furthermore, three scenarios were implemented for out-of-set validation and identification of a new MRGPRX antagonist. The significant role and value of the multilevel 3D features were characterized to give direct and quantitative guidance for GPCR drug design. Current limitation of 3D experimental data led to the two limitations of this study: (1) target coverage limited to 126 GPCRs and (2) limited comparison of predictive power between this 3D model and known 2D models because of the unsuitability of identical data (e.g. data conversion from 2D to 3D). We expect that the advance of protein technologies can improve 3D data issue. To solve big questions in drug discovery and biology, not only state-of-the art model architectures are required, but also the advance in molecular featurization beyond protein sequence and chemical smiles. Thus, GPCR-IPL will be used for a powerful model in the recent future through the combination of advanced ML methods with the feature.

Key Points

  • Multi-level featurization to describe G-protein-coupled receptor (GPCR)–ligand interaction patterns as a surrogate of the whole conformational dynamics.

  • GPCR-IPL score: predicting the biased activation of GPCR ligands and functional selectivity as the post-binding events of GPCR–ligand complexes.

  • Model interpretation with the interactive descriptions on interaction types of protein–drug pairs, their 3D location and quantified feature importance.

  • Out-of-set validation and application of the GPCR-IPL score with three scenarios.

Supplementary Material

2_Supplementary_File_Revised_V3_bbae105

Author Biographies

Biographical note and significance of this study

Despite the long history of G-protein-coupled receptor (GPCR) drug discovery along with computational methods, there still exist challenges in GPCR drug design. In particular, small substituent modification on GPCR ligands can substantially change post-binding events such as functional selectivity (agonism/antagonism) for specific GPCRs or biased activation (activation/suppression) for GPCR-mediated signal transduction. This study aimed to investigate GPCR–ligand interaction patterns and to use the patterns for predicting the post-binding events that occur through their conformational dynamics subsequent to the extracellular interactions. GPCR-IPL is a part of our continuous research into 3D molecular featurization. It offers 3D feature-driven guidance on modifying GPCR ligands to enhance the structure-activity relationship.

Mi-hyun Kim holds the position of Associate Professor at Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, Republic of Korea. Her scholarly interests explore in leveraging machine learning (ML) and deep learning (DL) techniques for the prediction of drug-target interactions and drug-drug interactions, the de novo design of small molecules and peptides (including artificial ones), as well as advancing the field of Organic Synthetic Chemistry.

Surendra Kumar is an Assistant Research Professor at Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, Republic of Korea. His research primarily focus on the application of machine learning (ML) and deep learning (DL) methodologies in the field of drug discovery, specifically targeting the development of small molecules and artificial peptides.

Mahesh K. Teli is a researcher at Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, Republic of Korea. His research focuses on Structural Biology and Drug Design.

Contributor Information

Surendra Kumar, Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.

Mahesh K Teli, Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.

Mi-hyun Kim, Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.

FUNDING

This study was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF), which is funded by the Ministry of Education, Science and Technology (Nos: 2020R1I1A1A01074750, 2022R1A2C2091810).

AUTHORS’ CONTRIBUTIONS

M.K. conceived and designed the study. Under M. K.’s plan, S.K. carried out all modeling & data work. M.K., M.K.T. and S. K. analyzed the data. M.K. and S.K. wrote the manuscript and revised it. M.K. provided the molecular modeling lab facility. All authors read and approved the final manuscript.

DATA AVAILABILITY

GPCR-IPL is available in the webserver: https://gpcr-ipl-score.onrender.com/. All source code and data are available in GitHub: https://github.com/college-of-pharmacy-gachon-university/GPCR-IPL_Score.

References

  • 1. Hilger D, Masureel M, Kobilka BK. Structure and dynamics of GPCR signaling complexes. Nat Struct Mol Biol 2018;25:4–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Latorraca NR, Venkatakrishnan AJ, Dror RO. GPCR dynamics: structures in motion. Chem Rev 2017;117:139–55. [DOI] [PubMed] [Google Scholar]
  • 3. Fredriksson R, Lagerström MC, Lundin L-Get al. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, Paralogon groups, and fingerprints. Mol Pharmacol 2003; 63:1256–72 [DOI] [PubMed] [Google Scholar]
  • 4. Hauser AS, Attwood MM, Rask-Andersen M, et al. Trends in GPCR drug discovery: new agents, targets and indications. Nat Rev Drug Discov 2017;16:829–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Heng BC, Aubel D, Fussenegger M. An overview of the diverse roles of G-protein coupled receptors (GPCRs) in the pathophysiology of various human diseases. Biotechnol Adv 2013;31:1676–94. [DOI] [PubMed] [Google Scholar]
  • 6. Huang Y, Todd N, Thathiah A. The role of GPCRs in neurodegenerative diseases: avenues for therapeutic intervention. Curr Opin Pharmacol 2017;32:96–110. [DOI] [PubMed] [Google Scholar]
  • 7. Sriram K, Insel PA. G protein-coupled receptors as targets for approved drugs: how many targets and how many drugs? Mol Pharmacol 2018;93:251–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Thomsen W, Frazer J, Unett D. Functional assays for screening GPCR targets. Curr Opin Biotechnol 2005;16:655–65. [DOI] [PubMed] [Google Scholar]
  • 9. Miao Y, McCammon JA. G-protein coupled receptors: advances in simulation and drug discovery. Curr Opin Struct Biol 2016;41:83–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lee Y, Basith S, Choi S. Recent advances in structure-based drug design targeting class a G protein-coupled receptors utilizing crystal structures and computational simulations. J Med Chem 2018;61:1–46. [DOI] [PubMed] [Google Scholar]
  • 11. Tan L, Yan W, McCorvy JD, Cheng J. Biased ligands of G protein-coupled receptors (GPCRs): structure–functional selectivity relationships (SFSRs) and therapeutic potential. J Med Chem 2018;61:9841–78. [DOI] [PubMed] [Google Scholar]
  • 12. Wang J, Miao Y. Chapter eleven - recent advances in computational studies of GPCR-G protein interactions. Adv Protein Chem Struct Biol 2019;116:397–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Powers AS, Pham V, Burger WAC, et al. Structural basis of efficacy-driven ligand selectivity at GPCRs. Nat Chem Biol 2023;19:805–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Kolakowski LF. GCRDb: a G-protein-coupled receptor database. Receptors Channels 1994;2:1–7. [PubMed] [Google Scholar]
  • 15. Ciancetta A, Sabbadin D, Federico S, et al. Advances in computational techniques to study GPCR–ligand recognition. Trends Pharmacol Sci 2015;36:878–90.26538318 [Google Scholar]
  • 16. Zhang D, Zhao Q, Beili W. Structural studies of G protein-coupled receptors. Mol Cells 2015;38:836–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Ibrahim P, Clark T. Metadynamics simulations of ligand binding to GPCRs. Curr Opin Struct Biol 2019;55:129–37. [DOI] [PubMed] [Google Scholar]
  • 18. Yuan S, Peng Q, Palczewski K, et al. Mechanistic studies on the stereoselectivity of the serotonin 5-HT 1A receptor. Angew Chem Int Ed 2016;55:8661–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Mattedi G, Deflorian F, Mason JS, et al. Understanding ligand binding selectivity in a prototypical GPCR family. J Chem Inf Model 2019;59:2830–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kooistra AJ, Leurs R, Esch IJP, de Graaf C. Structure-based prediction of G-protein-coupled receptor ligand function: a β-adrenoceptor case study. J Chem Inf Model 2015;55:1045–61. [DOI] [PubMed] [Google Scholar]
  • 21. Nicoli A, Dunkel A, Giorgino T, et al. Classification model for the second extracellular loop of class a GPCRs. J Chem Inf Model 2022;62:511–22. [DOI] [PubMed] [Google Scholar]
  • 22. Cai T, Abbu KA, Liu Y, Xie L. DeepREAL: a deep learning powered multi-scale modeling framework for predicting out-of-distribution ligand-induced GPCR activity. Bioinformatics 2022;38:2561–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Sanchez JE, Kc GB, Franco J, et al. BiasNet: a model to predict ligand bias toward GPCR signaling. J Chem Inf Model 2021;61:4190–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Zhou Q, Yang D, Wu M, et al. Common activation mechanism of class a GPCRs. Elife 2019;8:e50279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wingler LM, Lefkowitz RJ. Conformational basis of G protein-coupled receptor signaling versatility. Trends Cell Biol 2020;30:736–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Kumar S, Kim M. SMPLIP-score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors. J Chem 2021;13:28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Yamanishi Y, Araki M, Gutteridge A, et al. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 2008;24(13):i232–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Nigsch F, Bender A, Jenkins JL, Mitchell JBO. Ligand-target prediction using winnow and naive Bayesian algorithms and the implications of overall performance statistics. J Chem Inf Model 2008;48:2313–25. [DOI] [PubMed] [Google Scholar]
  • 29. LI H, Sze K-H, Lu G, Ballester PJ. Machine-learning scoring functions for structure-based drug lead optimization. Wiley interdisciplinary reviews: computational molecular. Science 2020;10(5):e1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Luo Y, Zhao X, Zhou J, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun 2017;8:573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Xiong G, Shen C, Yang Z, et al. Featurization strategies for protein–ligand interactions and their applications in scoring function development. Wiley interdisciplinary reviews: computational molecular. Science 2022;12(2):e1567. [Google Scholar]
  • 32. Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions. Brief Bioinform 2022;23(1):bbab476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Venkatakrishnan AJ, Deupi X, Lebon G, et al. Molecular signatures of G-protein-coupled receptors. Nature 2013;494:185–94. [DOI] [PubMed] [Google Scholar]
  • 34. Weill N, Rognan D. Alignment-free ultra-high-throughput comparison of druggable protein−ligand binding sites. J Chem Inf Model 2010;50:123–35. [DOI] [PubMed] [Google Scholar]
  • 35. Axen SD, Huang X-P, Cáceres EL, et al. A simple representation of three-dimensional molecular structure. J Med Chem 2017;60:7393–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Kooistra AJ, Mordalski S, Pándy-Szekeres G, et al. GPCRdb in 2021: integrating GPCR sequence, structure and function. Nucleic Acids Res 2021;49:D335–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Chan WKB, Zhang Y. Virtual screening of human class-a GPCRs using ligand profiles built on multiple ligand–receptor interactions. J Mol Biol 2020;432:4872–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Wlodek S, Skillman AG, Nicholls A. Ligand entropy in gas-phase, upon solvation and protein complexation. Fast estimation with quasi-Newton hessian. J Chem Theory Comput 2010;6:2140–52. [DOI] [PubMed] [Google Scholar]
  • 39. Cao C, Kang HJ, Singh I, et al. Structure, function and pharmacology of human itch GPCRs. Nature 2021;600:170–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Madhavi Sastry G, Adzhigirey M, Day T, et al. Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J Comput Aided Mol Des 2013;27:221–34. [DOI] [PubMed] [Google Scholar]
  • 41. Kumar S, Jang C, Subedi L, et al. Repurposing of FDA approved ring systems through bi-directional target-ring system dual screening. Sci Rep 2020;10:21133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011;12:2825–30. [Google Scholar]
  • 43. Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: Unbiased Boosting with Categorical Features. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N (eds). NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc. NY, United States, 2018.
  • 44. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 2017;30:1–9. [Google Scholar]
  • 45. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct 1975;405:442–51. [DOI] [PubMed] [Google Scholar]
  • 46. Sasaki Y. The truth of the F-measure. Teach Tutor Mater 2007;1:1–5. [Google Scholar]
  • 47. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006;27:861–74. [Google Scholar]
  • 48. Boyd K, Eng KH, Page CD. Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals, Springer, Berlin, Heidelberg, 2013, 451–66.
  • 49. Pagadala NS, Syed K, Tuszynski J. Software for molecular docking: a review. Biophys Rev 2017;9:91–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Raschka S. MLxtend: providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J Open Source Softw 2018;3:638. [Google Scholar]
  • 51. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017;30:1–9. [Google Scholar]
  • 52. McCorvy JD, Wacker D, Wang S, et al. Structural determinants of 5-HT2B receptor activation and biased agonism. Nat Struct Mol Biol 2018;25:787–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Lee S-H, Ahn S, Kim M. Comparing a query compound with drug target classes using 3D-chemical similarity. Int J Mol Sci 2020;21:4208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Ahn S, Lee SE, Kim M. Random-forest model for drug–target interaction prediction via Kullback–Leibler divergence. J Chem 2022;14:67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Wang L, You Z-H, Li L-P, et al. Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions. Sci Rep 2020;10:6641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Hou T, Bian Y, McGuire T, Xie XQ. Integrated multi-class classification and prediction of GPCR allosteric modulators by machine learning intelligence. Biomolecules 2021;11:870. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

2_Supplementary_File_Revised_V3_bbae105

Data Availability Statement

GPCR-IPL is available in the webserver: https://gpcr-ipl-score.onrender.com/. All source code and data are available in GitHub: https://github.com/college-of-pharmacy-gachon-university/GPCR-IPL_Score.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES