Abstract
The orexin 1 receptor (OX1R) is a G-protein coupled receptor that regulates a variety of physiological processes through interactions with the neuropeptides orexin A and B. Selective OX1R antagonists exhibit therapeutic effects in preclinical models of several behavioral disorders, including drug seeking and overeating. However, currently there are no selective OX1R antagonists approved for clinical use, fueling demand for novel compounds that act at this target. In this study, we meticulously curated a dataset comprising over 1300 OX1R ligands using a stringent filter and criteria cascade. Subsequently, we developed highly predictive quantitative structure-activity relationship (QSAR) models employing the optimized hyper-parameters for the random forest machine learning algorithm and twelve 2D molecular descriptors selected by recursive feature elimination with a 5-fold cross-validation process. The predictive capacity of the QSAR model was further assessed using an external test set and enrichment study, confirming its high predictivity. The practical applicability of our final QSAR model was demonstrated through virtual screening of the DrugBank database. This revealed two FDA-approved drugs (isavuconazole and cabozantinib) as potential OX1R ligands, confirmed by radiolabeled OX1R binding assays. To our best knowledge, this study represents the first report of highly predictive QSAR models on a large comprehensive dataset of diverse OX1R ligands, which should prove useful for the discovery and design of new compounds targeting this receptor.
Keywords: Hypocretin, Orexin, Hypothalamus, Machine learning, Random forest, Feature selection, QSAR, Virtual screening
1. Introduction
The neuropeptides orexins A and B (also known as hypocretins 1 and 2) are a pair of neuropeptides produced exclusively by neurons in caudal hypothalamus [1,2]. Despite their relatively small number (~4000 neurons in rat, ~70,000 in human) [3,4], orexin neurons mediate a broad range of physiological processes by acting at two receptor sub-types (orexin receptor 1 and 2; OX1R, OX2R) that are broadly distributed throughout the central nervous system [5]. Orexins have been shown to be critical regulators of arousal/wakefulness, stress reactivity and reward motivation [6–12], making this system an attractive target for pharmacological compounds designed to treat a myriad of disease states.
Early preclinical work indicated that blocking orexin signaling promoted sleep, and that better outcomes were achieved with antagonists that acted at both OX1R and OX2R compared with either receptor alone [13]. Consequently, three dual orexin receptor antagonists, suvorexant, lemborexant and daridorexant (Fig. 1A), have been approved by the FDA for clinical management of insomnia [14–17]. Since then, significant advances have been made in the understanding of OX1R vs. OX2R signaling, with substantial evidence now pointing to the potential utility of selective orexin receptor antagonists for non-sleep related indications. Most notably, excessive signaling at OX1R has been specifically implicated in various behavioral states, including substance use disorders, overeating and anxiety, hypertension, as well as several digestive cancers [18–30]. SB-334867, the first selective OX1R antagonist [31], was effective in vivo at attenuating drug and food seeking behaviors, but failed to progress to clinical use due to poor stability [32]. ACT-539313 (Fig. 1A), an orally active selective OX1R antagonist, advanced into phase 2 clinical trials for the treatment of binge eating disorder [33,34], but didn’t exhibit statistically significant efficacy in reducing binge eating episodes [35–37]. Although there are several other OX1R antagonists in the developmental pipeline (e.g. JNJ-61393215, anxiety disorder; AZD4041, opioid use disorder), at present, none have gained FDA approval for clinical use. Thus, there is interest in the development of novel OX1R antagonists [38], as well as the repurposing of existing compounds with previously unrecognized actions at OX1R.
Fig. 1.

A: Representative dual orexin receptor antagonists (suvorexant, lemborexant and daridorexant) and a selective OX1R antagonist (ACT-539313). B: Molecular alignment of 14 human OX1R crystal structures complexed with diverse ligands. C: Extracellular view of ligands binding to the OX1R. Suvorexant-bound crystal structure (6TO7.pdb) was shown in cyan ribbons. Ligands were rendered as stick models and colored by atom types. Ligands adopt an uncommon horseshoe binding conformation.
To this end, extensive efforts have been taken to solve the crystal structures of the human OX1R with diverse ligands. So far, 14 X-ray crystal structures of human OX1R have been deposited in the Protein Data Bank (PDB; www.rcsb.org) [39]). As shown in Fig. 1B, all ligands bind to the orthostatic site of OX1R, typical binding sites in GPCRs. However, structural analysis found unique interactions observed in crystal structures of orexin receptors. All OX1R ligands adopt an uncommon horseshoe binding conformation resulting from the intra-molecular pi-stacking interactions (Fig. 1C) [40–42]. Meanwhile, lipophilic hotspots and water molecules were found to play important roles in ligand-OX1R interactions. Molecular docking, the primary method for structure-based drug design, usually ignores or underestimates the energy contributions of intra-molecular interactions, lipophilic hotspots, and water-mediated hydrogen bonds. Thus, docking-based virtual screening using the OX1R crystal structures is unlikely to be effective in identifying new OX1R ligands. Consequently, we resorted to ligand-based drug design methods for our research needs. Surprisingly, PubMed searches using “orexin receptor” and “QSAR” or “machine learning” as search terms were unsuccessful in identifying any prior publications on such topics. Therefore, as a first step, we elected to develop quantitative structure-activity relationship (QSAR) models for OX1R ligands.
Here, molecular descriptor based QSAR models were developed for a large dataset of known OX1R ligands that were downloaded from the ChEMBL database [43]. KNIME workflows (https://www.knime.com/) were constructed to rigorously filter the dataset using best practices recommended by the Organization for Economic Cooperation and Development (OECD, http://www.oecd.org/). Highly predictive QSAR models were obtained employing the random forest machine learning method together with molecular operating environment (MOE) descriptors (Chemical Computing Group, Montreal, QC, Canada). The predictive capacity of the QSAR model was assessed using an external test set and an enrichment study. The practical applicability of our final QSAR model was demonstrated through virtual screening of the DrugBank database of FDA-approved drugs [44] and subsequent identification of two novel OX1R ligands.
2. Materials and methods
All calculations were performed on a MacBook with a 2.8 GHz Quad-Core Intel Core i7 processor and a memory of 16 GB RAM using Molecular Operating Environment (MOE.2020, Chemical Computing Group, Montreal, QC, Canada), KNIME (version 4.6.3) [45], Python (version 3.9.7) and Jupyter Notebook (version 6.4.5).
2.1. Data curation
We developed a workflow (Fig. 2) to collect bioactive ligands for human OX1R by using the opensource data analytics platform KNIME and its Community Extensions (available free of charge at https://www.knime.com/). The ChEMBL ID (CHEMBL5113) for human OX1R was used as the query to collect all OX1R ligands from the ChEMBL (version 30) website (https://www.ebi.ac.uk/chembl/) [43]. After all bioactive OX1R compounds were downloaded to KNIME, a series of filtering steps were adopted to eliminate those not fitting our criteria. First, only entries with exact measurements (“standard_relation” matches “=”) were kept and censored data (i.e., > or <) were removed. Second, only data measured in binding assays (“assay_type” = “B”) and reported with binding constants (“standard_type” = “Ki”) were retained. Next, compounds with missing chemical structures (“canonical_smiles” = “missing”) were removed. Compounds with identical IDs (“molecule_chembl_ id”) were then labeled as duplicates. If the difference in Ki values between duplicates was less than 10-fold, the first entry was kept for this compound. However, if the difference in Ki values was more than 10-fold, we searched the structures in ChEMBL and original publications to ensure we chose the correct Ki values. Resulting entries were further evaluated to confirm that “standard_units” were equal to “nM”. Eventually, the Ki values of all entries that passed the aforementioned filters and criteria were converted to pKi values (pKi = −log(Ki)).
Fig. 2.

KNIME workflows to download OX1R bioactive ligands from ChEMBL and to clean the dataset using various criteria.
2.2. Feature calculation and preprocessing
All compounds were then imported to MOE for further processing. First, the Database Wash operation was performed for the structures (represented as SMILES strings) to remove salts, solvents, and other minor components in their structures. After the structures were cleaned, an MOE script of “db_unique.svl” was run to remove duplicates with identical structures. Next, 216 MOE built-in 2D molecular descriptors or features (here, both terms are used interchangeably) and MACCS fingerprints were calculated for all the retained structures. To identify potential outliers, a principal component analysis (PCA) was performed for all 2D descriptors using MOE. A 3D graphical plot was built with respect to the first three principal components (PC1, PC2 and PC3; Fig. 3). A molecule was considered as an “outlier” when it was located distant from the majority of data in the 3D graphical plot. After removal of the outliers in the dataset, clustering analysis was carried out to evaluate the chemical space in the dataset using MACCS fingerprints and the Tanimoto coefficient. A resulting dataset of 1350 OX1R ligands was submitted to a KNIME workflow for the feature selection.
Fig. 3.

The PCA plot of the entire dataset of OX1R ligands. The first three principal components (PCA1, PCA2, PCA3) were displayed. Two outliers were highlighted in pink.
All compounds in the dataset were imported to a KNIME workflow (Fig. 2) to remove descriptors with constant values. The number of features was further reduced by removing those highly correlated using the KNIME nodes of “Linear Correlation” and “Correlation Filter”. Linear correlation analysis was performed to calculate Pearson coefficients for each pair of descriptors. A pair of features with a Pearson correlation coefficient ≥0.7 were identified as highly correlated and one of them was eliminated as a redundant feature.
2.3. Machine learning (ML) algorithms
Four ML methods implemented using scikit-learn (a library for traditional machine learning algorithms for the Python programming language, version 1.1.1) were used to develop QSAR models for the dataset of OX1R ligands: random forest (RF), extreme gradient boosting (XGBoost), and both linear and radial basis function (RBF) support vector machine (SVM). These methods have been widely applied to predictions of molecular properties and compound activities with mixed performance.
2.4. Dataset partition
The entire dataset of 1350 OX1R ligands was randomly partitioned into a training set and a test set with the ratio of 4:1. Preliminary QSAR models using all 76 features were first developed to compare the performance of different ML methods. To avoid the effect of randomness in data splitting, 50 independent runs with different random seeds were conducted to assess each ML method in a comprehensive way. The values of R2 (squared correlation coefficient of the regression model) for the training and test sets were selected as the criteria to evaluate the statistical performance of the ML methods. R2 and are defined in Eq. (1). The method with the best collective performance for both training and test sets — RF, was selected for the current QSAR modeling study.
2.5. Feature selection and hyper-parameter optimization
Recursive Feature Elimination (RFE) with cross-validation is commonly used for feature selection. In the current study, RFE with 5-fold cross-validation from scikit-learn (the RFECV module) was run for the entire training set to select the most important features for the RF method. Then the GridSearchCV module with 5-fold cross-validation from scikit-learn was used to optimize the hyper-parameters for the ML study on the training set and the features chosen by the RFECV module. R2 was used as the optimization criterion. The following parameters were used for the optimization process: n_estimators = [200,500,1000], max_features = [‘auto’, ‘sqrt’, ‘log2’], max_depth = [4,6,8,10,12].
After determining the most important features and optimal hyper-parameters, the external test set was used to validate the predictivity and generalization capability of the QSAR models. Fifty independent runs with different random seeds were conducted to avoid coincidence. A final QSAR model was built for the entire dataset of OX1R ligands with optimized molecular features and hyper-parameters (random_state = 439713).
2.6. Model performance evaluation
Three statistical indicators were adopted to evaluate the quality of the regression models, including the square correlation coefficients (R2) in Eq. (1), the root mean squared error (RMSE) in Eq. (2), and the mean absolute error (MAE) in Eq. (3).
| (1) |
| (2) |
| (3) |
where yi and ŷi are the experimental and predicted values of the samples in the training set, respectively. ȳi is the mean experimental value of all the samples in the training set; N is the number of the samples in the training set. (Q2, RMSEcv, MAEcv) and (, RMSEpred, MAEpred) are separately used for the cross-validation dataset and test set, and defined similarly to Eqs. (1–3) shown above.
2.7. Enrichment study
To further validate our QSAR models for virtual screening purposes, an enrichment study was performed. One hundred twenty-six highly active OX1R ligands (pKi ≥ 8.0) in the dataset were selected and clustered using MACCS fingerprints and Tanimoto coefficient (0.8) with MOE, resulting in 48 clusters. Ligands with the highest pKi values in each cluster were chosen as the active OX1R ligands to generate decoys on the DUD-E server (http://dude.docking.org) [46,47]. Each active ligand collected 50 decoy molecules that possess similar physicochemical properties but distinct molecular topology. Using the default setting in the DUE-E server, 48 active molecules produced 2499 decoys, resulting in an enrichment dataset of 2548 compounds for this study. These compounds were prepared using the same protocols as molecules in the training set. All 2D descriptors were calculated using MOE. The final QSAR model developed for the entire cleaned dataset was used to predict the binding constants (pKi) for all compounds in the enrichment dataset. Enrichment factor (EF) and goodness of hit list (GH) score, defined by Eqs. (4) and (5), were used as metrics to evaluate the predictivity of the QSAR model [48,49].
| (4) |
| (5) |
where D represents the total number of all compounds in the dataset, including true actives and decoys; and A is the total number of true active compounds. Ht and Ha stand for the number of true active and predicted active compounds (or hits) by the QSAR model in the hit list, respectively.
EF measures how well a QSAR model can identify active molecules relative to a random selection. For example, a QSAR-based screening with EF= 10 means this QSAR model can identify active compounds at the rate of 10 times that of random selection. The higher the EF, the more likely an active molecule would be selected from a given virtual screening. The GH score at ranges of 0–1 considers both true actives ratio and true inactives ratio, and therefore measures how well a model can discriminate actives from inactives. Models with GH > 0.6 are generally considered reliable, while GH= 0 means a null model and GH= 1 means a perfect model [50].
2.8. QSAR-based virtual screening
To further assess the predictivity of our QSAR models, virtual screening was performed against the DrugBank database (version 5.1.9, https://www.drugbank.ca) [44], which contains 2715 drugs approved by the US Food and Drug Administration (FDA). Peptide-like and inorganic molecules were eliminated from selection. Furthermore, drugs with molecular weights below 250 or above 700 g/mole were removed, resulting in a filtered DrugBank database of 1614 drugs for the current study.
Daridorexant (an OX1R/OX2R antagonist) was approved for clinical use by the FDA in the United States in January 2022. Therefore, when the current DrugBank database was released on Jan 4th, 2022, daridorexant was not included in the collection. Consequently, the DrugBank database was supplemented by the addition of this drug to evaluate the predictive performance of our QSAR model. All drug molecules were prepared in the same protocols as molecules in the training set. All 2D descriptors were calculated using MOE. The top-ranked hits were explored to assess the performance of the QSAR model for virtual screening.
2.9. In vitro radiolabeled binding assay
Radiolabeled binding assays of the OX1R were performed by Eurofins Panlabs Discovery Services (Taipei, Taiwan) using human recombinant CHO-S cells, according to a previously reported protocol [51]. The binding assay was conducted in duplicate on membrane preparations that had been re-suspended at pH7.4 in the buffer consisting of 25 mM Hepes/NaOH, 2.5 mM MgCl2, 2.5 mM CaCl2, 0.5 mM EDTA, and 0.025% Bacitracin. [125I] Orexin A (0.1 nM) was used as the radioligand. Following 1-hr incubation at 25 °C, the binding assay was terminated by addition of cold buffer. The mixture was then filtered through Whatman GF/B filters and washed with cold buffer. Radioactivity was determined using the TopCount NTX liquid scintillation counter (PerkinElmer, Waltham, MA). Non-specific binding for the OX1R was measured in the presence of 1.0 μM unlabeled SB-334867. Receptor binding data were analyzed by nonlinear regression of saturation and competition curves using the GraphPad Prism 8.0 software (GraphPad Software, La Jolla, CA).
3. Results and discussion
3.1. Data curation
Best practices, recommended by OECD were followed for pooling data from disparate sources for the present QSAR study [52]. In order to achieve statistically significant and predictive models, several heuristic rules were adopted in selecting the dataset t: (1) the minimum range of bioactivities should be three log units, (2) the biological activities should be distributed evenly throughout the range, and (3) the dataset should contain diverse compounds to develop a comprehensive QSAR model [53]. Ideally, all compounds should come from a single source to minimize the inevitable variation in pharmacological data measured by different laboratories. When this is not possible, as is the case with the subject OX1R antagonists, vigorous validation should be done to evaluate pooling data from different sources, which has been shown to be a feasible strategy by our previous QSAR studies [54–56].
ChEMBL is a valuable resource for chemical and biological data [43]; however, the data are highly heterogeneous and cannot be used for QSAR modeling studies in their current state. Consequently, we developed KNIME workflows to rigorously clean and filter the bioactivity datasets downloaded from ChEMBL for the present QSAR studies (Fig. 2). The ChEMBL ID (CHEMBL5113) for human OX1R was used to search ChEMBL and found 5123 bioactivities associated with this orexin receptor, 2308 of which were reported with exact measurements in binding assays. Filtering with Ki as the binding affinity measurement yielded 1642 entries containing the chemical structure information. After removing duplicates, 1365 unique compounds were retrieved, and their binding constant (Ki) values were converted to pKi values. The experimental biological activities (pKi) in the entire dataset range from 4.18 to 10.22 with a spread of more than 6 logarithmic units: 589 weakly active compounds (pKi < 6.0), 654 moderately active compounds (6.0 ≤ pKi < 8.0), and 126 highly active compounds (pKi ≥8.0). The entire dataset of 1369 OX1R ligands were further processed using MOE to remove redundant structures and heavy compounds (molecular weights ≥1000 g/mole) since here we were only interested in small molecules.
MOE’s built-in descriptors cover a wide range of molecular properties, including physical properties, atom and bond counts, Hückel theory descriptors, subdivided surface area descriptors, shape and charge descriptors, adjacency and distance matrix descriptors, and pharmacophore feature descriptors. These descriptors have been widely used in QSAR modeling of physicochemical properties and bioactivities. In the current study, we started with all MOE 2D descriptors trying to utilize all ligand information for QSAR modeling.
A fundamental assumption of all QSAR studies is the Similar Property Principle stating that structurally similar compounds possess similar activities or properties. However, it is not uncommon that small structural changes in some compounds can translate to large changes in activities and properties, in violation of this principle. Such compounds are classified as activity cliffs or outliers, and should be avoided in any QSAR modeling study [53]. To address this issue, we conducted a Principal Component Analysis (PCA) on the 2D molecular descriptors computed using MOE for the entire dataset. A molecule is considered an outlier if it is located distant from most data in the 3D graphical plot of the first three principal components (PC1, PC2 and PC3; Fig. 3). Two activity outliers with the highest molecular weight in this dataset were identified by the PCA and, therefore, removed from the QSAR modeling (Fig. 3), resulting in 1350 unique OX1R ligands in the cleaned dataset. Clustering analysis was performed to evaluate the chemical space in the dataset using MACCS fingerprints and a Tanimoto coefficient of 0.80, a recommended value for similarity-based analysis [57]. A total of 518 clusters were identified, suggesting a highly diverse dataset.
Furthermore, 30 features (i.e., a_nP: number of phosphorus atoms) were identified at constant values by KNIME workflows (Fig. 2) and therefore were eliminated. Pair-wise linear correlation analysis between all remaining descriptors was performed to remove 110 redundant features, which greatly reduced the number of features from 186 to 76. The resulting dataset includes 1350 ligands and 76 features.
3.2. Preliminary QSAR modeling
A wide variety of machine learning (ML) algorithms have been applied to develop QSAR models; however, no single one has been proven to be superior to others. Recent studies of ML methods suggest RF, XGBoost, and SVM are among the best performers [58,59]. RF is an ensemble learning method that combines multiple decision trees to create a more robust and accurate predictive model. It works by constructing a collection of decision trees during training and then averaging their predictions during inference. XGBoost is another ensemble learning technique that focuses on boosting multiple weak learners, typically decision trees, into a strong predictive model. It employs a gradient boosting algorithm to iteratively add trees that correct the errors of the previous ones, thereby improving the overall model performance. SVM seeks to find the optimal hyperplane that best separates data points from different classes or predicts the target values in a regression context. SVM is built with different kernels, which allow them to operate effectively in higher-dimensional spaces without explicitly transforming the data. The kernels of linear and radial basis function (RBF) are commonly used for linear and complexed data analysis (SVM-linear and SVM-RBF), respectively. Meanwhile, descriptor based QSAR models have been shown to generally produce more accurate predictions and more easily interpreted models than graph neural network, although the latter has gained substantial attention and been broadly applied to various endpoints [60]. In the currently study, we based our QSAR models on molecular descriptors using 4 different ML methods: RF, XGBoost, SVM-Linear and SVM-RBF.
A random partition using the ratio of 4:1 yielded a training set of 1080 compounds and a test set of 270 compounds. Preliminary QSAR models using all 76 features were constructed for the training set and then applied to the test set for validation with four ML methods. The model statistics are shown in Fig. 4. The averaged R2 values among 50 independent runs on the training set were 0.945 ± 0.002, 0.994 ± 0.002, 0.498 ± 0.012, and 0.502 ± 0.014 for RF, XGboost, SVR-RBF and SVR-Linear methods, respectively. Meanwhile, the averaged values on the test set were 0.664 ± 0.026, 0.649 ± 0.049, 0.405 ± 0.066, and 0.384 ± 0.083 for RF, XGboost, SVR-RBF and SVR-Linear methods, respectively. RF and XGboost clearly yielded much better statistical results than SVR methods in both the training and test sets. However, XGboost produced a large performance variation for the test set in terms of , ranging from 0.506 to 0.727. On the other hand, RF yielded consistent performance for the test set with ranging from 0.601 to 0.733, suggesting the RF model statistics are less likely generated by coincidence. Taken together, RF was selected as the ML method for the QSAR study.
Fig. 4.

Boxplots comparing the performance of 4 different ML methods. Left and right are R2 values for the training and test sets, respectively. Red lines represent medians; open circles represent outliers.
3.3. Feature selection
Five-fold cross-validation is a widely used procedure to evaluate the performance of ML models and was adopted by the current QSAR study for feature selection and hyper-parameter optimization. During this process, the whole training set was randomly partitioned into five roughly equal-sized groups. The QSAR model was constructed using the data in four groups, and the prediction statistics for the remaining group were calculated. The process was regenerated five times so that every group could be used as the validation set. Recursive feature elimination (RFE) is a feature selection method that removes the least relevant features for ML models until the specified number of features is reached. The RFECV technique in scikit-learn — recursive feature elimination with a 5-fold cross-validation — was employed to identify the most relevant features for ML studies. The model performance measured by cross-validation Q2 scores is shown in Fig. 5. A subset of 12 features yielded the best statistical results with .
Fig. 5.

Cross-validation scores (Q2) and the number of features selected by RFECV. The blue line indicates the optimal 12 features selected by this process.
These features include 1 physical property descriptor (mr), 1 bond count descriptor (opr_nrot), 1 Hűckel theory descriptor (h_ema), 1 partial charge descriptor (PEOE_VSA+0), 3 subdivided surface area descriptors (SMR_VSA4, SlogP_VSA3, SlogP_VSA7), and 5 adjacency and distance matrix descriptors (balabanJ, BCUT_PEOE_0, BCUT_PEOE_3, GCUT_SLOGP_0, GCUT_PEOE_0). Brief descriptions for these descriptors are shown in Table 1. Their maximum and minimum values define the applicability domain (AD) for the QSAR models. The AD is defined as “a theoretical region in the chemical space constructed by both the model descriptors and modeled response” [61]. The concept of the AD resonates with the Similar Property Principle described above and is highly recommended by the OECD to define the model limitations related to its chemical space and biological response domain. Main methods for defining a predictive QSAR AD are based on physical-chemical properties, molecular fingerprints or fragments, and the response space [62, 63]. Different similarity metrics with respect to the model’s training set have also been employed, including range- and distance-based methods. Each AD approach has its own strengths and limitations, and depends on the purpose of the studies. In the present study, the AD is defined using the physical-chemical properties (12 selected descriptors) and range-based similarity metric and pKi values (4.18–10.22) (Table 1).
Table 1.
Descriptions of 12 selected features by the RFECV procedure.
| Descriptor Name | Category | Description | Max | Min |
|---|---|---|---|---|
| mr | Physical Properties | Molecular refractivity | 15.36 | 8.68 |
| h_ema | Hückel Theory | Sum of hydrogen bond acceptor strengths | 13.81 | 3.47 |
| opr_nrot | Bond Counts | Number of rotatable bonds | 12 | 2 |
| PEOE_VSA+ 0 | Partial Charge | Sum of van der Waals (vdW) surface areas of all atoms with PEOE partial charges in the range [0.00,0.05] | 208.36 | 0 |
| SMR_VSA4 | Subdivided Surface Area Descriptors | Sum of vdW surface areas of all atoms with molar refractivity in the range [0.39,0.44] | 57.39 | 0 |
| SlogP_VSA3 | Sum of vdW surface areas of all atoms with logP in the range [0.35,0.39] | 152.49 | 0 | |
| SlogP_VSA7 | Sum of vdW surface areas of all atoms with logP > 0.56 | 294.18 | 24.45 | |
| balabanJ | Adjacency and Distance Matrix | Balaban’s connectivity topological index | 1.85 | 0.98 |
| BCUT_PEOE_0 | Descriptors | The smallest eigenvalues in the BCUT_PEOE calculations. The BCUT descriptors are calculated from the eigenvalues of a modified adjacency matrix | −2.26 | −2.9 |
| BCUT_PEOE_3 | The largest eigenvalues in the BCUT_PEOE calculations. | 2.88 | 2.26 | |
| GCUT_SLOGP_0 | The smallest eigenvalues in the GCUT_SLOGP calculations. The GCUT descriptors are calculated from the eigenvalues of a modified graph distance adjacency matrix | −0.73 | −1.37 | |
| GCUT_PEOE_0 | The smallest eigenvalues in the GCUT_PEOE calculations | −0.77 | −0.89 |
3.4. Hyper-parameter optimization
After the most relevant features were selected, the 5-fold cross-validation grid search technique (GridSearchCV) was employed to optimize three hyper-parameters for the RF models on the training set: n_estimators — the number of trees in the forest; max_features — the number of features to consider when searching for the best split; max_depth — the number of splits that each decision tree is allowed to make. The optimized hyper-parameters were: n_estimators = 500; max_features = ‘sqrt’; max_depth = 12. The resulting RF QSAR model demonstrated excellent stability with R2 = 0.925 and robust cross-validation predictivity with Q2 = 0.659 with low prediction errors (RMSE = 0.311 and MAE = 0.240; RMSEcv = 0.672 and MAEcv = 0.506).
3.5. Model evaluation
In QSAR studies, cross-validation techniques are commonly used to assess the predictive ability of the model. Although a high value of Q2 is necessary and important, it is not a sufficient condition for a model to possess high predictivity. To further evaluate the predictive power of the QSAR model, an external test set of 270 compounds, which were not included for model generation, was submitted for prediction of binding affinity by the RF QSAR model based on the optimized hyper-parameters. A robust predictivity was observed for the test set with good prediction statistics (, RMSEpred = 0.694, MAEpred = 0.450).
Since random partitions were adopted in the current study, we sought to evaluate how randomness affects model performance. Fifty independent RF models using the optimized hyper-parameters but with different random seeds were conducted for the training set and then applied to the test set. The results for the 50 QSAR models, summarized in Table 2, demonstrated excellent internal stability and predictive power with R2 = 0.915 – 0.927 and , also indicating model statistics are less likely from coincidence. The prediction results for the QSAR model with random seed = 439713 are plotted in Fig. 6, showing strong correlation between experimental and predicted pKi values for the OX1R ligands. The distribution plot of the residuals (differences) between corresponding values of experimental and predicted pKi reveals that the QSAR model prediction errors are randomly distributed around the zero line (Fig. 7), denoting the absence of systematic errors that usually arise from biased measurement or calculation. Only 4 out of 1080 (0.37%) compounds in the training set were predicted with residuals > 1 log unit, demonstrating the high quality of the dataset of OX1R ligands for the present QSAR study. A final QSAR model was constructed for the cleaned dataset of 1350 OX1R ligands (R2 = 0.920, RMSE = 0.322, and MAE = 0.247) and employed for the enrichment study and virtual screening.
Table 2.
QSAR model statistics for 50 independent runs with different random seeds.
| Training Set | Test Set | |||||
|---|---|---|---|---|---|---|
| R2 | RMSE | MAE | RMSEpred | MAEpred | ||
| mean | 0.921 | 0.319 | 0.244 | 0.679 | 0.645 | 0.482 |
| sd* | 0.003 | 0.061 | 0.004 | 0.028 | 0.193 | 0.022 |
| min | 0.915 | 0.305 | 0.235 | 0.601 | 0.579 | 0.444 |
| median | 0.920 | 0.320 | 0.244 | 0.680 | 0.645 | 0.480 |
| max | 0.927 | 0.330 | 0.252 | 0.733 | 0.738 | 0.559 |
standard deviation.
Fig. 6.

Scatter plots of QSAR model predicted and experimental pKi values. Left: The training set; Right: The test set. The fitted lines are shown in each plot.
Fig. 7.

Scatter plot for residual values for the training set. The red line is for the residual values = 0. Residual = Exp. pKi − Pred. pKi.
Thorough validation is a crucial step in the development of QSAR models. In addition to using an external test set to validate our models, we also conducted an enrichment study using a decoy dataset to further evaluate the predictivity of our QSAR model. From the curated OX1R ligand set, 126 highly active compounds (pKi ≥ 8.0) were identified. Subsequent clustering analysis with MACCS fingerprints and Tanimoto coefficient (0.8) for these ligands resulted in 48 diverse clusters. A decoy dataset for the 48 clusters of active OX1R ligands was generated on the DUD-E server (http://dude.docking.org) [46,47]. Each active ligand collected 50 decoy molecules that possess similar physicochemical properties but distinct molecular topology. Our QSAR model was employed to predict the pKi values for the entire decoy dataset including actives and decoys. Enrichment factor (EF) and goodness of hit list (GH) score, defined by Eqs. (4) and (5), were used as metrics to evaluate the predictivity of the QSAR model. Using pKi = 8.0 (Ki = 10 nM) as the cutoff to separate actives and inactives, 11 compounds predicted as actives by the QSAR model were all true actives, yielding EF= 51 and GH= 0.81. If a half-log unit is tolerated in prediction, using pKi = 7.5 (Ki = 30 nM) as the cutoff, 32 are true actives among 40 predicted actives by the QSAR model, yielding EF= 41 and GH= 0.76. For this decoy dataset, EF is equal to 2 for a random selection. These results sindicate that our QSAR model is 20–25 times more likely to identify active OX1R compounds than a random selection. Meanwhile, GH> 0.70 suggests that our model can discriminate actives from inactives during a virtual screening. Taken together, this enrichment study strongly demonstrates that our RF-based QSAR is highly predictive and discriminative at finding actives and separating them from inactives.
3.6. Virtual screening and biological testing
Validation studies using the external test set and enrichment study demonstrated the robustness and excellent predictivity of our QSAR models. As the first case study, our QSAR model was used for the virtual screening of the filtered DrugBank database, a database of 1615 drugs approved by the FDA. Three FDA-approved orexin receptor antagonists (suvorexant, lemborexant and daridorexant) ranked among the top 4 of the hit list, indicating that our QSAR model can identify true active ligands for OX1R. We visually checked the top 100 compounds predicted by the QSAR model and acquired four drugs (macitentan, zafirlukast, isavuconazole, and cabozantinib) for in vitro testing based on their commercial availability and structural diversity. Initial biological evaluation for human OX1R binding affinity was conducted at duplicate in human recombinant CHO-S cells at a concentration of 10 μM according to the protocol previously reported [51]. [125I] Orexin A (0.1 nM) was used as the radioligand, while unlabeled SB-334867 (1.0 μM) was used to determine non-specific binding for the OX1R. Both isavuconazole and cabozantinib exhibited 30–40% binding affinity for the OX1R at tested conditions (Table 3); solubility issues with both compounds prohibited screening at higher concentrations. In contrast, macitentan and zafirlukast showed minimal (~10%) binding affinity for OX1R in this binding assay.
Table 3.
Two FDA-approved drugs exhibited binding affinity for human OX1R in radiolabeled binding assaya.
| Name | %Binding | Indication | Structure |
|---|---|---|---|
| Isavuconazole | 31 | Antifungal |
|
| Cabozantinib | 38 | Anticancer |
|
Radiolabeled binding assay of the OX1R was performed by Eurofins Panlabs Discovery Services in human recombinant CHO-S cells at 10 μM of drug concentration using [125I] Orexin A as the radioligand according to the protocol previously reported [51].
To our knowledge, neither isavuconazole nor cabozantinib has been reported to exhibit any binding to OX1R. Isavuconazole has been approved as an antifungal medication for invasive and severe infection such as aspergillosis and mucormycosis [64]. It acts as an inhibitor for lanosterol 14α-demethylase, whose inhibition could disrupt the membrane structures of fungal cells [65]. On the other hand, cabozantinib is a broad-spectrum inhibitor of the tyrosine kinases such as c-Met, VEGFR2, and Axl [66], and has been approved to treat various digestive cancers, including thyroid cancer, renal cell carcinoma, and hepatocellular carcinoma [67,68]. This is interesting, as OX1Rs (but not OX2Rs) are expressed in human digestive cancers but not in corresponding healthy epithelia [69], opening the possibility that some of the therapeutic efficacy of cabozantinib might be due to its (previously unappreciated) actions at OX1R. Consistent with this, almorexant, which acts at both OX1R and OX2R, has previously been reported to have anti-tumoral properties in vitro, in vivo, and ex vivo [70]. Moreover, fatigue and reduced appetite are among the most common adverse events associated with cabozantinib treatment [71], which aligns with a role for the orexin system in arousal and feeding, respectively. Thus, although considerable additional invetigation is required, these new findings may help to further elucidate the mechanisms underlying the therapeutic effects and side effect profiles of these two drugs, and might even raise opportunities for their repurposing for other clinical indications.
We further investigated why macitentan and zafirlukast were false actives from the virtual screening. We compared the values of 12 molecular descriptors selected for QSAR modeling and found both macitentan and zafirlukast have one feature falling out of the AD defined by the minimum and maximum values of each descriptor (Table 1). On the other hand, all features for both isavuconazole and cabozantinib fall within the AD. We therefore speculate that issues with the AD may be partially responsible for the inaccurate predictions by our model. The AD plays an important role in estimating the uncertainty of predictions for test compounds based on how similar they are to the training compounds used for developing the QSAR model. Compounds flagged as AD non-compliant should be de-prioritized for further studies. In our case, future efforts should determine if new compounds fall within the ADs of all descriptors used for model development before biological testing.
4. Conclusions
Molecular descriptor-based QSAR models were developed to gain insights into the structure-activity relationships and, ultimately, to guide virtual screening and rational design, of novel OX1R ligands. Best practices recommended by the OECD were followed to develop statistically robust and predictive QSAR models. A large and diverse dataset of OX1R ligands was curated by pooling ligand information from ChEMBL. The ChEMBL ID for human OX1R was used as the query to search the ChEMBL database, which produced more than 5000 bioactivity data entries. KNIME workflows were constructed to rigorously clean and filter the heterogeneous bioactivity dataset downloaded from ChEMBL, resulting in a refined dataset of 1369 structurally diverse OX1R ligands with the bioactivity pKi. PCA was then conducted to eliminate structural or activity outliers from the dataset assembled for the QSAR studies. Preliminary QSAR studies suggested RF generated the best statistics among four ML methods (RF, XGBoost, SVM-Linear and SVM-RBF) for this OX1R ligand dataset. Feature elimination with 5-fold cross-validation was performed to select twelve out of 216 MOE 2D molecular descriptors as the most relevant for the current QSAR modeling. Hyper-parameters for the RF algorithm were further optimized by 5-fold cross-validation grid search technique. Descriptor-based QSAR models were constructed for the selected features and optimized RF hyper-parameters by randomly dividing the modeling set into a training and test set with the ratio of 4:1. In order to evaluate how random partitions affect the model performance, 50 independent RF models with different random seeds were conducted for the training set and then applied to the test set, indicating model statistics are less likely due to coincidence with excellent internal stability (R2 = 0.915 – 0.927) and predictive power ().
Enrichment studies were conducted to further evaluate the predictivity of our QSAR models by using a decoy set for the most potent OX1R ligands generated by the DUD-E server, confirming their excellent predictivity with results of EF > 40 and GH > 0.70. To further evaluate the predictive power of our models, QSAR model based virtual screening was carried out against the filtered DrugBank database. Three FDA-approved orexin receptor antagonists (suvorexant, lemborexant and daridorexant) ranked among the top 4 of identified ‘hits’, indicating that our QSAR model can accurately identify true active ligands for OX1R. Two FDA-approved drugs, isavuconazole and cabozantinib, were found to exhibit 30–40% binding to the OX1R via in vitro radiolabeled binding assays under the test conditions. To our knowledge, this is the first report of actions at the OX1R for either of these compounds, and thus may be useful in further understanding the mechanisms for their therapeutic uses and side effects.
It should be emphasized that these QSAR models are statistically valid only within their applicability domains with respect to their range of binding affinities (pKi = 4.18–10.22) and the chemical space represented by molecular descriptors of the OX1R ligands in the present study. Furthermore, our QSAR models can predict only binding affinities (pKi) for OX1R but not the ligands’ functionality as a receptor agonist vs. antagonist. Moreover, our models cannot predict the ligand binding selectivity for OX1R vs. OX2R. This will be important for future work to address, as some physiological processes are preferentially mediated by signaling at OX1R vs. OX2R [7,72,73]. Indeed, we intend to extend our work here to further develop QSAR models to identify compounds that act at OX2R. Considering the high degree of similarity between the two orexin receptors [1], these QSAR models are expected to share similar features and overlapping ADs. hAny unique features might be useful in determining ligand specificity for either OX1R or OX2R.
In recent years, the field of QSAR has witnessed a significant advance in chemical representation and ML methods. Beyond the conventional molecular descriptors, the inclusion of molecular graphs and sophisticated fingerprints has empowered the encoding of intricate structural information [74,75]. Furthermore, quantum chemical descriptors have added a new dimension to chemical representations, offering a more realistic portrayal of molecular interactions and properties [76,77]. Meanwhile, the emergence of deep learning has revolutionized QSAR by allowing the development of highly complex and sophisticated models. Deep neural networks (NNs), such as convolutional and recurrent NNs, have demonstrated exceptional performance in extracting features from molecular data [78,79]. While these NN-based models excel at capturing complex SARs, they often demand large datasets and can be perceived as ‘black boxes’ with limited interpretability. In contrast, traditional QSAR models are generally more interpretable and computationally efficient but may struggle to capture non-linear SARs between molecular structures and activities. The performance of NN-based QSAR models in property predictions has been widely studied, yielding mixed results [60,80–82]. Comprehensive comparisons between traditional and NN-based ML methods indicate that no single method has conclusively proven superior to the other [58,59]. The selection of chemical representations and ML methods depends on many factors, including the availability and quality of data, as well as the objectives of the study.
In our current study, we present a rigorous data curation workflow and a highly predictive QSAR model based on a traditional RF method and only 12 physicochemical properties. This model offers ease of interpretation, making it particularly appealing to medicinal chemists. Furthermore, it can be readily adapted to address various other pharmacological targets. Furthermore, this report represents the first successful attempt to develop comprehensive QSAR models for a large dataset of more than 1300 structurally diverse OX1R ligands. These efforts should prove useful for the identification of new drug leads and for prediction of their OX1R binding affinity (pKi) prior to the resource-demanding tasks of chemical synthesis and experimental biological evaluation. Moreover, these QSAR models might be useful as a filter to exclude molecules that exhibit off-target activity at the OX1R in drug discovery campaigns.
Supplementary Material
Funding
This research was supported by the National Institute on Drug Abuse: R00 DA045765(MHJ) and T32 DA055569 (SLO).
Abbreviations:
- QSAR
quantitative structure-activity relationship
- OX1R
orexin 1 receptor
- OX2R
orexin 2 receptor
- GPCR
G-protein coupled receptor
- PDB
Protein Data Bank
- MOE
molecular operating environment
- PCA
principal component analysis
- RF
random forest
- XGBoost
extreme gradient boosting
- SVM
support vector machine
- RBF
radial basis function
- RFECV
recursive Feature Elimination with cross-validation
- RMSE
root mean squared error MAE, mean absolute error
- EF
enrichment factor
- GH
goodness of hit list
- AD
applicability domain
Footnotes
CRediT authorship contribution statement
Conceptualization, W.J.W and M.H.J.; methodology, W.J.W. and V. Y.Z; software, W.J.W. and V.Y.Z.; validation, W.J.W., V.Y.Z and M.H.J.; formal analysis, V.Y.Z; investigation, V.Y.Z; resources, W.J.W., V.Y.Z., S. L.O., and M.H.J; data curation, V.Y.Z.; writing—original draft preparation, V.Y.Z and M.H.J; writing—review and editing, S.L.O., W.J.W and M.H.J.; visualization, W.J.W., V.Y.Z and M.H.J.; supervision, W.J.W and M.H.J.; project administration, S.L.O., W.J.W and M.H.J.; funding acquisition, M.H.J. All authors have read and agreed to the published version of the manuscript.
Declaration of Competing Interest
MHJ is an inventor on a provisional patent application (63/601,522) which describes therapeutic methods and drug discovery methods relevant to the orexin 1 receptor. All other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supporting information
Supplementary data associated with this article can be found in the online version at doi:10.1016/j.aichem.2023.100040.
Data Availability
The DrugBank database can be found at https://www.drugbank.com/. The ChEMBL database can be found at https://www.ebi.ac.uk/chembl/.
References
- [1].Sakurai T, et al. , Orexins and orexin receptors: a family of hypothalamic neuropeptides and G protein-coupled receptors that regulate feeding behavior, Cell 92 (4) (1998) 573–585. [DOI] [PubMed] [Google Scholar]
- [2].de Lecea L, et al. , The hypocretins: hypothalamus-specific peptides with neuroexcitatory activity, Proc. Natl. Acad. Sci. U. S. A 95 (1) (1998) 322–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Thannickal TC, et al. , Reduced number of hypocretin neurons in human narcolepsy, Neuron 27 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Kilduff TS, Peyron C, The hypocretin/orexin ligand-receptor system: implications for sleep and sleep disorders, Trends Neurosci. 23 (8) (2000) 359–365. [DOI] [PubMed] [Google Scholar]
- [5].Marcus JN, et al. , Differential expression of orexin receptors 1 and 2 in the rat brain, J. Comp. Neurol 435 (1) (2001) 6–25. [DOI] [PubMed] [Google Scholar]
- [6].James MH, Aston-Jones G, Orexin reserve: a mechanistic framework for the role of Orexins (Hypocretins) in addiction, Biol. Psychiatry 92 (11) (2022) 836–844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Mahler SV, et al. , Motivational activation: a unifying hypothesis of orexin/hypocretin function, Nat. Neurosci 17 (10) (2014) 1298–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].James MH, Campbell EJ, Dayas CV, Role of the orexin/hypocretin system in stress-related psychiatric disorders, Curr. Top. Behav. Neurosci 33 (2017) 197–219. [DOI] [PubMed] [Google Scholar]
- [9].James MH, Mahler SV, Moorman DE, Aston-Jones G, A decade of orexin/hypocretin and addiction: where are we now? Curr. Top. Behav. Neurosci 33 (2017) 247–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Tyree SM, Borniger JC, de Lecea L, Hypocretin as a hub for arousal and motivation, Front Neurol. 9 (2018) 413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Johnson PL, et al. , Orexin, stress, and anxiety/panic states, Prog. Brain Res 198 (2012) 133–161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Mehr JB, Mitchison D, Bowrey HE, James MH, Sleep dysregulation in binge eating disorder and “food addiction”: the orexin (hypocretin) system as a potential neurobiological link, Neuropsychopharmacology (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Winrow CJ, et al. , Promotion of sleep by suvorexant-a novel dual orexin receptor antagonist, J. Neurogenet 25 (1–2) (2011) 52–61. [DOI] [PubMed] [Google Scholar]
- [14].Waters K, Review of the efficacy and safety of Lemborexant, a Dual Receptor Orexin Antagonist (DORA), in the treatment of adults with insomnia disorder, Ann. Pharm 56 (2) (2022) 213–221. [DOI] [PubMed] [Google Scholar]
- [15].Muehlan C, et al. , Clinical pharmacology, efficacy, and safety of orexin receptor antagonists for the treatment of insomnia disorders, Expert Opin. Drug Metab. Toxicol 16 (11) (2020) 1063–1078. [DOI] [PubMed] [Google Scholar]
- [16].Roch C, Bergamini G, Steiner MA, Clozel M, Nonclinical pharmacology of daridorexant: a new dual orexin receptor antagonist for the treatment of insomnia, Psychopharmacology (Berl.) 238 (10) (2021) 2693–2708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Markham A, Daridorexant: first approval, Drugs 82 (5) (2022) 601–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].James MH, Bowrey HE, Stopper CM, Aston-Jones G, Demand elasticity predicts addiction endophenotypes and the therapeutic efficacy of an orexin/hypocretin-1 receptor antagonist in rats, Eur. J. Neurosci 50 (3) (2019) 2602–2612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Barson JR, Leibowitz SF, Orexin/hypocretin system: role in food and drug overconsumption, Int. Rev. Neurobiol 136 (2017) 199–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Johnson PL, et al. , Activation of the orexin 1 receptor is a critical component of CO(2)-mediated anxiety and hypertension but not bradycardia, Neuropsychopharmacology 37 (8) (2012) 1911–1922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].James MH, et al. , Increased number and activity of a lateral subpopulation of hypothalamic orexin/hypocretin neurons underlies the expression of an addicted state in rats, Biol. Psychiatry 85 (11) (2019) 925–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Freeman LR, Bentzley BS, James MH, Aston-Jones G, Sex differences in demand for highly palatable foods: role of the orexin system, Int. J. Neuropsychopharmacol (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Fragale JE, Pantazis CB, James MH, Aston-Jones G, The role of orexin-1 receptor signaling in demand for the opioid fentanyl, Neuropsychopharmacology 44 (10) (2019) 1690–1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Mohammadkhani A, James MH, Pantazis CB, Aston-Jones G, Persistent effects of the orexin-1 receptor antagonist SB-334867 on motivation for the fast acting opioid remifentanil, Brain Res. (2019) 146461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Johnson PL, et al. , Orexin 1 receptors are a novel target to modulate panic responses and the panic brain network, Physiol. Behav 107 (5) (2012) 733–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Martin-Fardon R, Weiss F, Blockade of hypocretin receptor-1 preferentially prevents cocaine seeking: comparison with natural reward seeking, Neuroreport 25 (7) (2014) 485–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Zhou JJ, et al. , Downregulation of orexin receptor in hypothalamic paraventricular nucleus decreases blood pressure in obese zucker rats, J. Am. Heart Assoc 8 (13) (2019) e011434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Laburthe M, Voisin T, The orexin receptor OX(1)R in colon cancer: a promising therapeutic target and a new paradigm in G protein-coupled receptor signalling through ITIMs, Br. J. Pharm 165 (6) (2012) 1678–1687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].James MH, Fragale JE, O’Connor SL, Zimmer BA, Aston-Jones G, The orexin (hypocretin) neuropeptide system is a target for novel therapeutics to treat cocaine use disorder with alcohol coabuse, Neuropharmacology 183 (2020) 108359, 10.1016/j.neuropharm.2020.108359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Brown RM, Dayas CV, James MH, Smith RJ, New directions in modelling dysregulated reward seeking for food and drugs, Neurosci Biobehav Rev 132 (2022) 1037–1048, 10.1016/j.neubiorev.2021.10.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Smart D, et al. , SB-334867-A: the first selective orexin-1 receptor antagonist, Br. J. Pharm 132 (6) (2001) 1179–1182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].McElhinny CJ Jr., et al. , Hydrolytic instability of the important orexin 1 receptor antagonist SB-334867: possible confounding effects on in vivo and in vitro studies, Bioorg. Med. Chem. Lett 22 (21) (2012) 6661–6664. [DOI] [PubMed] [Google Scholar]
- [33].Kaufmann P, et al. , First-in-human study with ACT-539313, a novel selective orexin-1 receptor antagonist, Br. J. Clin. Pharm 86 (7) (2020) 1377–1386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Kaufmann P, et al. , Multiple-dose clinical pharmacology of the selective orexin-1 receptor antagonist ACT-539313, Prog. Neuropsychopharmacol. Biol. Psychiatry 108 (2021) 110166. [DOI] [PubMed] [Google Scholar]
- [35].〈https://www.idorsia.com/media/news-details?newsId=2748933〉.
- [36].Foldi CJ, et al. , Advancing translational neuroscience research for eating disorders, Aust. N. Z. J. Psychiatry 56 (7) (2022) 739–741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].McElroy SL, et al. , Efficacy, safety, and tolerability of nivasorexant in adults with binge-eating disorder: A randomized, Phase II proof of concept trial, Int J Eat Disord 56 (11) (2023) 2120–2130. [DOI] [PubMed] [Google Scholar]
- [38].James MH, Aston-Jones G, Introduction to the Special Issue: “Making orexin-based therapies for addiction a reality: What are the steps from here? Brain Res 1731 (2020) 146665 10.1016/j.brainres.2020.146665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Berman HM, et al. , The protein data bank, Nucleic Acids Res. 28 (1) (2000) 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Rappas M, et al. , Comparison of orexin 1 and orexin 2 ligand binding modes using X-ray crystallography and computational analysis, J. Med. Chem 63 (4) (2020) 1528–1543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Yin J, Mobarec JC, Kolb P, Rosenbaum DM, Crystal structure of the human OX2 orexin receptor bound to the insomnia drug suvorexant, Nature 519 (7542) (2015) 247–250. [DOI] [PubMed] [Google Scholar]
- [42].Yin J, et al. , Structure and ligand-binding mechanism of the human OX1 and OX2 orexin receptors, Nat. Struct. Mol. Biol 23 (4) (2016) 293–299. [DOI] [PubMed] [Google Scholar]
- [43].Gaulton A, et al. , ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res. 40 (Database issue) (2012) D1100–D1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Law V, et al. , DrugBank 4.0: shedding new light on drug metabolism, Nucleic Acids Res. 42 (D1) (2014) D1091–D1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Berthold MR, et al. , KNIME: the Konstanz information miner. Data Analysis, Machine Learning and Applications, Springer; Berlin Heidelberg, Berlin, Heidelberg, 2008. [Google Scholar]
- [46].Mysinger MM, Carchia M, Irwin JJ, Shoichet BK, Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking, J. Med Chem 55 (14) (2012) 6582–6594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Huang N, Shoichet BK, Irwin JJ, Benchmarking sets for molecular docking, J. Med. Chem 49 (23) (2006) 6789–6801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Bender A, Glen RC, A discussion of measures of enrichment in virtual screening: comparing the information content of descriptors with increasing levels of sophistication, J. Chem. Inf. Model 45 (5) (2005) 1369–1375. [DOI] [PubMed] [Google Scholar]
- [49].Güner OF, Henry DR, Metric for analyzing hit lists and pharmacophores. Pharmacophore Perception, Development, and Use in Drug Design, International University,, 2000, pp. 191–211. [Google Scholar]
- [50].Zhang M, et al. , Sirtinol promotes PEPCK1 degradation and inhibits gluconeogenesis by inhibiting deacetylase SIRT2, Sci. Rep 7 (1) (2017) 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Langmead CJ, et al. , Characterisation of the binding of [3H]-SB-674042, a novel nonpeptide antagonist, to the human orexin-1 receptor, Br. J. Pharm 141 (2) (2004) 340–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Jaworska JS, Comber M, Auer C, Van Leeuwen CJ, Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints, Environ. Health Perspect 111 (10) (2003) 1358–1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Tropsha A, Best practices for QSAR model development, validation, and exploitation, Mol. Inf 29 (6–7) (2010) 476–488. [DOI] [PubMed] [Google Scholar]
- [54].Peng Y, Dong H, Welsh WJ, Comprehensive 3D-QSAR model predicts binding affinity of structurally diverse sigma 1 receptor ligands, J. Chem. Inf. Model 59 (1) (2019) 486–497. [DOI] [PubMed] [Google Scholar]
- [55].Peng Y, et al. , 3D-QSAR comparative molecular field analysis on opioid receptor antagonists: pooling data from different studies, J. Med. Chem 48 (5) (2005) 1620–1629. [DOI] [PubMed] [Google Scholar]
- [56].Peng Y, Zhang Q, Welsh WJ, Novel sigma 1 receptor antagonists as potential therapeutics for pain management, J. Med. Chem 64 (1) (2021) 890–904. [DOI] [PubMed] [Google Scholar]
- [57].Jasial S, Hu Y, Vogt M, Bajorath J, Activity-relevant similarity values for fingerprints and implications for similarity searching, F1000Res 5 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Rombo SE, Ursino D, Integrative bioinformatics and omics data source interoperability in the next-generation sequencing era-Editorial, Brief. Bioinform 22 (1) (2021) 1–2. [DOI] [PubMed] [Google Scholar]
- [59].Zin PPK, Williams GJ, Ekins S, Cheminformatics analysis and modeling with macrolactoneDB, Sci. Rep 10 (1) (2020) 6284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Jiang D, et al. , Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models, J. Chemin.− 13 (1) (2021) 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Roy K, Kar S, Das RN, Statistical methods in QSAR/QSPR. A Primer on QSAR/QSPR Modeling, Springer, 2015. [Google Scholar]
- [62].Gadaleta D, et al. , Applicability domain for QSAR models: where theory meets reality, Int. J. Quant. Struct. -Prop. Relatsh. (IJQSPR) 1 (1) (2016) 45–63. [Google Scholar]
- [63].Sahigara F, et al. , Comparison of different approaches to define the applicability domain of QSAR models, Molecules 17 (5) (2012) 4791–4810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Donnelley MA, Zhu ES, Thompson GR 3rd, Isavuconazole in the treatment of invasive aspergillosis and mucormycosis infections, Infect. Drug Resist 9 (2016) 79–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Pettit NN, Carver PL, Isavuconazole: a new option for the management of invasive fungal infections, Ann. Pharm 49 (7) (2015) 825–842. [DOI] [PubMed] [Google Scholar]
- [66].Grullich C, Cabozantinib: a MET, RET, and VEGFR2 tyrosine kinase inhibitor, Recent Results Cancer Res. 201 (2014) 207–214. [DOI] [PubMed] [Google Scholar]
- [67].Durante C, Russo D, Verrienti A, Filetti S, XL184 (cabozantinib) for medullary thyroid carcinoma, Expert Opin. Invest. Drugs 20 (3) (2011) 407–413. [DOI] [PubMed] [Google Scholar]
- [68].Escudier B, Lougheed JC, Albiges L, Cabozantinib for the treatment of renal cell carcinoma, Expert Opin. Pharm 17 (18) (2016) 2499–2504. [DOI] [PubMed] [Google Scholar]
- [69].Voisin T, et al. , Aberrant expression of OX1 receptors for orexins in colon cancers and liver metastases: an openable gate to apoptosis, Cancer Res. 71 (9) (2011) 3341–3351. [DOI] [PubMed] [Google Scholar]
- [70].Dayot S, et al. , In vitro, in vivo and ex vivo demonstration of the antitumoral role of hypocretin-1/orexin-A and almorexant in pancreatic ductal adenocarcinoma, Oncotarget 9 (6) (2018) 6952–6967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [71].Rimassa L, Danesi R, Pressiani T, Merle P, Management of adverse events associated with tyrosine kinase inhibitors: improving outcomes for patients with hepatocellular carcinoma, Cancer Treat. Rev 77 (2019) 20–28. [DOI] [PubMed] [Google Scholar]
- [72].Mehr JB, Bilotti MM, James MH, Orexin (hypocretin) and addiction, Trends Neurosci. 44 (11) (2021) 852–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].James MH, et al. , Repurposing the dual orexin receptor antagonist suvorexant for the treatment of opioid use disorder: why sleep on this any longer? Neuropsychopharmacology 45 (5) (2020) 717–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Winter R, Montanari F, Noe F, Clevert DA, Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations, Chem. Sci 10 (6) (2019) 1692–1701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Feng H, Jiang J, Wei GW, Machine-learning repurposing of DrugBank compounds for opioid use disorder, Comput. Biol. Med 160 (2023) 106921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Miao Y, Ma H, Huang J, Recent advances in toxicity prediction: applications of deep graph learning, Chem. Res. Toxicol 36 (8) (2023) 1206–1226. [DOI] [PubMed] [Google Scholar]
- [77].Karelson M, Lobanov VS, Katritzky AR, Quantum-chemical descriptors in QSAR/QSPR studies, Chem. Rev 96 (3) (1996) 1027–1044. [DOI] [PubMed] [Google Scholar]
- [78].Gupta A, et al. , Generative recurrent networks for de novo drug design, Mol. Inf 37 (1–2) (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [79].Meyer JG, et al. , Learning drug functions from chemical structures with convolutional neural networks and random forests, J. Chem. Inf. Model 59 (10) (2019) 4438–4449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Wu Z, et al. , MoleculeNet: a benchmark for molecular machine learning, Chem. Sci 9 (2) (2018) 513–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Yang K, et al. , Analyzing learned molecular representations for property prediction, J. Chem. Inf. Model 59 (8) (2019) 3370–3388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Xiong Z, et al. , Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism, J. Med. Chem 63 (16) (2020) 8749–8760. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The DrugBank database can be found at https://www.drugbank.com/. The ChEMBL database can be found at https://www.ebi.ac.uk/chembl/.
