Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2018 Feb 14;13(6):614–626. doi: 10.1002/cmdc.201700754

3D‐e‐Chem: Structural Cheminformatics Workflows for Computer‐Aided Drug Discovery

Albert J Kooistra 1,2,, Márton Vass 2,, Ross McGuire 1,3, Rob Leurs 2, Iwan J P de Esch 2, Gert Vriend 1, Stefan Verhoeven 4,, Chris de Graaf 2,
PMCID: PMC5900740  PMID: 29337438

Abstract

eScience technologies are needed to process the information available in many heterogeneous types of protein–ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure‐based bioactivity data mapping, ii) structure‐based identification of scaffold replacement strategies for ligand design, iii) ligand‐based target prediction, iv) protein sequence‐based binding site identification and ligand repurposing, and v) structure‐based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well‐established standards allows the re‐use of these protocols and facilitates the design of customized computer‐aided drug discovery workflows.

Keywords: cheminformatics workflows, KNIME, ligand design, ligand repurposing, target prediction

Introduction

There is a need for eScience technologies to process the large volumes of rapidly generated, heterogeneous1 protein–ligand interaction data into computational models that enable the design of efficacious and safe medicines.2 The ChEMBL database (version 23), for example, contains over 14 million data entries on 11 500 protein targets, of which 4600 human, covering 1.7 million unique compounds.3 The Protein Data Bank (PDB, accessed October 21, 2017) contains more than 130 000 structures with nearly 24 000 small molecules covering 67 000 unique protein–ligand complexes.4 Currently 20 000 human proteins have been deposited in Swiss‐Prot5 (version 2017_10), of which 3300 proteins are also present in ChEMBL. Comparison of the protein, ligand, and bioactivity data in ChEMBL, PDB, and UniProt indicates that structural information is lacking for more than 95 % of the protein–ligand pairs for which bioactivity data has been reported, and for more than 75 % of the human proteins for which sequence information is available. In silico chemogenomics6 and computer‐aided drug discovery methods can be used to predict protein–ligand interactions in order to fill these bioactivity‐structure and sequence‐structure gaps, identify new protein–ligand pairs, and design new ligands.6b, 7 The success rate of such methods strongly depends on the efficient integration of chemical, pharmacological and structural data to train, optimize, and evaluate ligand‐ and protein‐based models.6b, 7a,7b An effective approach to accomplish this is through the development of scientific workflows8 that facilitate the standardization of protocols,7c the integration of data and analyses, and re‐use of parts of protocols to customize, extend, or design new workflows for different targets or applications.9 KNIME10 and Pipeline Pilot11 are established workflow managers in the field of cheminformatics and computer‐aided drug discovery, with a growing number of users.8 Several ligand‐based workflows have been reported that combine chemical and biological data sources for ligand‐based target prediction.12 Few structure‐based workflows have been reported, including protocols for pharmacophore screening,13 structure‐based ligand optimization,14 as well as combined ligand‐ and protein‐based ligand repurposing.15 Several of the tools in the reported workflows, however, use commercial computer‐aided drug discovery software that is not accessible without a paid license.15b, 16 Most freely available cheminformatics tools17 (nodes) that can be run within these workflows focus on small molecules18 and the number of nodes that use freely available structure‐based approaches is relatively scarce.

The current work describes the integration and analysis of several chemical, biological, and structural data types in workflows that can be used for: i) structure‐based bioactivity data mapping, ii) structure‐based identification of scaffold replacement strategies for ligand design, iii) ligand‐based target prediction, iv) protein sequence‐based binding site identification and ligand repurposing within a protein family, and v) structure‐based pharmacophore comparison for ligand repurposing across protein families.

The flexible workflows and protocols presented here can be used as templates for the standardization of protocols, the integration of data and analyses, and can readily be reused or extended for the creation of new computer‐aided drug discovery workflows for other protein targets and applications. The cases will focus on two of the pharmaceutically most relevant protein targets, namely G protein‐coupled receptors (GPCRs) and kinases.

Moreover, this work presents new KNIME nodes that enable the analysis and prediction of protein–ligand interactions using freely accessible structural cheminformatics tools, including: i) web service nodes to extract and combine data from GPCR (GPCRdb)23, 29 and kinase30 (KLIFS)24 focused databases, ii) nodes to set up, run, and analyze results of structural pharmacophore‐based protein binding site comparison (KRIPO),26, 31 ligand shape‐based (Shape‐it)25 and pharmacophore‐based (Align‐it)25 comparison, and molecular docking simulation (PLANTS)27 tools, and, iii) new KNIME nodes to perform amino acid sequence entropy analyses (ss‐TEA),28 align (aligner), read, and write pharmacophores (pharmacophores), and visualize protein–ligand complexes and pharmacophores in 3D (molviewer) (Figure 1).

Figure 1.

Figure 1

Overview of structural cheminformatics tools and workflows for computer‐aided drug discovery applications described in the current study. Pharmacological (ChEMBL)3 and structural (PDB)4 data on protein–ligand interactions are integrated and complemented by structural chemogenomics analyses of ligand, protein, and protein–ligand interaction features by the combination of different KNIME nodes, including small molecule ligand cheminformatics toolkits (e.g., CDK,17, 19 ChemAxon,20 Openbabel,21 RDKit),22 web service nodes to extract information from GPCR (GPCRdb)23 and kinase (KLIFS)24 focused databases, and nodes to perform ligand shape‐based (Shape‐it),25 ligand pharmacophore‐based (Align‐it),25 and protein pharmacophore‐based (KRIPO)26 similarity searches, molecular docking simulations (PLANTS),27 amino acid sequence entropy analyses (ss‐TEA),28 pharmacophore alignments (aligner), and to visualize protein–ligand complexes and pharmacophores (molviewer). Workflows for structure‐based bioactivity data mapping, ligand design, target prediction and ligand repurposing are described in the current work and provided as Supporting Information.

All nodes and tools used to perform the analyses described in the current work are available as community contributions in KNIME under “3D‐e‐Chem” (https://www.knime.com/3d-e-chem-nodes-for-knime), the source code for all nodes and all workflows themselves are available via GitHub (https://github.com/3D-e-Chem/workflows), and everything is also embedded within an updated version of our 3D‐e‐Chem virtual machine31 (https://3d-e-chem.github.io/3D-e-Chem-VM/). This enables all users to download, apply, customize, and extend the workflows to their own protein targets of interest in order to answer different chemogenomics or drug discovery related questions.

Results and Discussion

Structure‐based bioactivity data mapping of kinase inhibitors

Protein–ligand crystal structures provide information regarding protein–ligand interactions and protein conformations, whereas bioactivity data provides insight into the binding affinity or functional effect. The integration of structural and bioactivity data allows one to interpret differences and similarities in bioactivity (e.g., affinity cliffs) to ligand binding modes, specific protein–ligand interactions, and to extrapolate these insights to other protein targets. In the next workflow (Figure 2) we have combined bioactivity data from ChEMBL and (structural) kinase data from KLIFS to create a matrix of available bioactivity data on human kinases for all co‐crystallized kinase ligands.

Figure 2.

Figure 2

Structure‐based bioactivity data mapping workflow (A) of kinase inhibitors using both the KLIFS and the ChEMBL database. The heatmap (B) shows the bioactivity profile for the top 100 co‐crystallized kinase ligands with the largest amount of data available for the top 400 kinases. The kinomes, created with KinMap,32 show the number of unique kinase‐inhibitor complexes based on KLIFS (C) and the number of unique kinase inhibitors based on ChEMBL (D). The data accumulated in this workflow are summarized (E) for two well‐known kinase inhibitors, namely Seliciclib and Dasatinib (indicated with a blue and green arrow, respectively on the Y‐axis of the heatmap). *Only human kinases are listed.

Protocol:

  1. Collect protein information and the molecular structures of co‐crystallized ligands (here from KLIFS)

  2. Retrieve the available bioactivity data for the ligands (here from ChEMBL)

  3. Clean, curate, and process the bioactivity data

  4. Selection of the compounds and kinase targets of interest

  5. Formatting and visualizing the data

The molecular structures of all 2552 unique co‐crystallized small molecule kinase inhibitors were collected via KLIFS nodes (KLIFS accessed August 18th, 2017) in SMILES format. The InChIKeys of the inhibitors were subsequently used to retrieve the ChEMBL IDs for the compounds (1583 matches) including all corresponding bioactivity data (166 976 data points). Using the human kinase list from KLIFS all bioactivity data was reduced to solely the human kinome (86 601 data points for 432 kinases). The top 100 compounds with the largest number of available bioactivity data (excluding single concentration measurements) for kinases30 was then selected together with the top 400 kinases and the median log value of the bioactivity data for each unique compound–kinase pair. The data was then transformed into a matrix and visualized as a heatmap using the JFreeChart HeatMap node. The heat map shows clear differences in the bioactivity profiles between kinase inhibitors and highlights promiscuous and selective compounds as well as the gaps in the bioactivity matrix. This workflow illustrates a simple, yet powerful, method of complementing a structure‐based view of kinase inhibitors with the available pharmacological data for more advanced structural chemogenomics applications (Figure 2).

Scaffold replacements for kinase ligand design

Scaffold hopping is a common approach in which a part of a known active compound is changed while trying to maintain the binding affinity and binding mode of the original compound in order to obtain better ADMET/PKPD or physicochemical properties or to escape patent infringement.33 In the next workflow (Figure 3) protein–ligand interaction similarity6a, 34 as well as chemical similarity is used to identify molecular pairs with a low chemical similarity but a high interaction similarity, thereby providing interesting starting points for the design of hybrid molecules that have a high probability of maintaining their binding mode.

Figure 3.

Figure 3

A workflow (A) for the identification of potential scaffold replacements for kinase inhibitors while maintaining the protein–ligand interaction profile by combining protein–ligand interaction fingerprint (IFP) similarity with ligand‐based dissimilarity (ECFP‐4) analyses. The scaffold hop between an imidazopyridine inhibitor (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4DIT)35 and a carboxamide inhibitor (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4PTG),36 shown as the first entry in the table overview (B), was used to design a merged molecule (C). This design was docked into GSK3B (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4PTG)36 using the PLANTS nodes and visualized in the Ligand and Protein Viewer (D).

Protocol:

  • Collect structural information, protein–ligand interaction fingerprints (IFP), and molecular structures of the co‐crystallized ligands (here from KLIFS)

  • Perform full pairwise ligand‐based similarity and IFP similarity analyses

  • Filter the data by selecting ligand pairs with a low molecular similarity and a high interaction similarity

  • Obtain the aligned structures and compare the binding modes of the molecule pairs of interest

  • Design a scaffold hop based on the selected molecule pair and dock them into the desired protein kinase

  • Visually evaluate the obtained binding modes, compare their interaction fingerprints, or perform another binding mode comparison technique.

Starting from the KLIFS nodes all structural information on human kinases (7552 unique monomers) was downloaded including the kinase‐inhibitor interaction fingerprints (IFP) and the SMILES of the co‐crystallized kinase inhibitors. Subsequently, a group loop is started that processes all structures per individual kinase. Within the loop, a pairwise interaction‐based IFP6a, 34 and ligand‐based ECFP‐437 comparison is performed for all complexes of each kinase. The combinations are subsequently filtered for ligand pairs with a low chemical similarity (ECFP‐4 Tanimoto score <0.26) and a high interaction similarity (IFP Tanimoto score >0.75), that is, all chemically distinct ligand pairs that do have similar interactions with the kinase target are selected. From the resulting list of pairs, an imidazopyridine inhibitor (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4DIT)35 and a carboxamide inhibitor (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4PTG)36 in complex with GSK3B with a very low ligand similarity (Tanimoto ECFP‐4=0.188) and an identical protein–ligand interaction pattern (Tanimoto IFP=1.0) were selected as an example for further inspection. From both structures, the KLIFS aligned full monomer and ligand were download and subsequently visualized using the Ligands and Proteins Viewer showing the overlay of the ligands in the GSK3B binding site. These two kinase inhibitors were subsequently used to design a hybrid compound drawn in the MarvinSketch node. Finally, this design was docked into the GSK3B binding site (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4PTG) using the newly developed PLANTS27 docking nodes. Upon visual inspection of the obtained binding modes within the Ligands and Proteins viewer, a highly conserved binding mode of both parts of the hybrid design is observed. Within this workflow the chemical dissimilarity is complemented with protein–ligand interaction patterns to identify distinct molecules with similar mechanisms of binding. This combination of techniques provides new opportunities for molecular design based on known ligands and the workflow could, for example, be rewired and extended for more advanced fragment‐based replacement approaches.

Ligand‐based cross‐reactivity prediction

The derivation of similarity measures between different protein receptors may be used to explore cross‐reactivities and to explore the potential for compounds to display (useful) polypharmacology. The PP_GPCR (protein–protein association GPCR) workflow (Figure 4 A) follows methodologies used in previous efforts39 to explore the relationships between protein targets using ligand topology. This chemocentric approach involves describing the sets of ligands for each protein target by chemical fingerprint descriptors,40 and comparing the sets with each other to derive similarities between protein targets. With this approach, one can derive protein–ligand and protein–protein associations ranging from biologically expected to less obvious.

Figure 4.

Figure 4

Ligand‐based GPCR cross‐reactivity workflow (A) with selected output (C) from the nearest neighbor calculation of four of the five reference compounds. Blue boxes highlight areas where recalculated tables are provided and may be used for faster and more efficient processing. Green boxes show areas for user input and adjustment. The heatmap (B) summarizes the ligand‐based similarity overlap for all provided GPCR ligands. The protein target network (D) highlighted in a flareplot38 shows the top 500 associations between protein targets based on their shared ligand similarities (line thickness indicates the significance), the associations of the melanocortin receptors are highlighted in red.

Protocol:

  • Collect available bioactivity data for a protein family or (full) set of proteins of interest

  • Clean, curate, process, and filter the bioactivity data

  • Calculate ligand‐based fingerprint descriptors for each compound

  • Goal 1: Protein–protein association prediction

  • ○ Perform an all‐against‐all comparison of the fingerprints and select relevant hits based on a user‐definable cutoff

  • ○ Group the number of hits per protein target pair and calculate an E‐value

  • ○ Output of the results for visualization in, for example, Cytoscape41 or flareplots.38

  • Goal 2: Identification of potential protein targets for small molecules

  • ○ User input of the small molecules of interest and calculate their ligand‐based fingerprint descriptors

  • ○ Perform a fingerprint comparison against the protein dataset and select hits based on a user‐definable cutoff

  • ○ Group the number of hits per protein target and calculate an E‐value

The protocol is applicable to any combination of data sets with unknown distributions of structures and biological activity values, user intervention to vary thresholds, similarity measures, fingerprints and statistical approaches is made possible. The PP_GPCR workflow reads in data from a public data source, ChEMBL, for all non‐olfactory GPCR receptors as derived from the GPCRdb.23 Various filters for allowed activity type (EC50, IC50, AC50, K b, K D, K i) and threshold activity (pAct ≥5) are applied, a minimum compound set size of 5 is required, and a restriction on the number of calculated rotatable bonds (maximum of 18) is used to limit the number of very large, flexible compounds. The latter is performed as in our experience the presence of large numbers of peptide/peptoid compounds can lead to some targets being routinely overrepresented in later comparisons. Fingerprint descriptors (in this case RDKit: Daylight‐like topological fingerprint) were calculated for each compound and the similarities between the receptor sets were determined using a user‐definable threshold for similarity, here set to a minimum of 0.7. Use of the raw similarities and set size following Keiser39a allowed the calculation of E‐values, used to rank the similarity between protein targets. The similarities between receptors are viewable as a KNIME Table and Excel File. To highlight some of the identified similarities the top 500 protein associations were visualized in a flareplot38 (Figure 4 D) and a heatmap (Figure 4 B). The melanocortin receptors, for example, show links with opioid, endothelin, chemokine and somatostatin receptors. These associations have previously been explored by Quillan et al.42

The PP_GPCR workflow may also be used to calculate potential targets/cross‐activities for individual compounds. A compound may be entered into the workflow or, if already present in the data, simply extracted and compared with the fingerprints already present allowing the calculation of the statistical significance and ranking by E‐values. To analyze the predictive ability of the PP_GPCR workflow, the workflow was applied to five reference structures taken from Keiser et al.39b with an experimentally validated GPCR affinity (K i<1000 nm). Using the default similarity cut‐off of 0.7, for four of the five compounds (Sedalande, Dimetholazine, Xenazine and Fabhistine) previously predicted activities were reflected in the top‐five nearest neighbors in the PP_GPCR workflow (Figure 4 C). Lowering the similarity cut‐off increases the likelihood of detecting further nearest neighbors at the expense of a larger number of hits.

Sequence‐based ligand repurposing within a protein family

Sequence‐based identification of key residues for a specific protein can help with the identification of binding site residues or residues that are linked to a specific receptor function. More importantly, this information can be exploited for ligand repurposing as proteins that share similarity for these key residues can potentially bind similar ligands.43 In this workflow (Figure 5) we use a double entropy sequence analysis method (ss‐TEA) to identify these key residues, and perform a sequence‐based comparison for these residues to identify similar proteins (within the same protein family) as potential candidates for ligand repurposing.44

Figure 5.

Figure 5

Workflow (A) for the identification of ligand repurposing possibilities using a sequence‐based double entropy analysis (ss‐TEA). This example shows the identification of the opioid, serotonin, and dopamine receptors as potential repurposing targets for somatostatin type 5 inhibitors, which was retrospectively verified using ChEMBL data (C) and a literature search (F). The scatterplot (B) shows the internal entropy (X‐axis) versus the external entropy (Y‐axis) for each residue and is colored by the ss‐TEA score (the lower the more significant). Part of the summarized analysis results are shown in (C) the interactive table viewer and (D) the identified nearest proteins for SST5R are shown in the phylogenetic tree of human GPCRs. (E) A sequence alignment of solely the residues (using the Ballesteros–Weinstein residue numbering scheme)45 identified with ss‐TEA for the somatostatin receptors and highlighted cross‐reactivity targets.

Protocol:

  • Create or obtain a large sequence alignment for a protein family

  • Selection of the protein subfamily of interest

  • Perform the double entropy ss‐TEA analysis for identification of key residues for the selected subfamily

  • Extract the aligned key residues and perform a sequence comparison to identify nearest neighbors

  • Collect additional ligand and bioactivity data for the nearest neighbors

The workflow begins by gathering a complete list of all class A GPCR families (300), all class A GPCRs (11 731), and the aligned and numbered protein residues for each GPCR (4 536 590 in total) using the GPCRdb23 nodes. The structure‐based residue numbering was then used to obtain a matrix with the position‐based alignment of all GPCR residues. At this point, the user can inspect the table of GPCR families and highlight the GPCR receptor/subfamily of interest using an interactive table viewer. The user selection, in this case the Somatostatin receptor type 5 (SST5R), is then used to create a subfamily (i.e., reference group) as input for the double entropy analysis by the ss‐TEA node. All residue positions are scored according to the entropy within the subfamily (internal entropy) compared to the entropy outside the subfamily (external entropy). The 20 residue positions within the seven transmembrane helices with the lowest score (the residues with a low internal entropy, but a high external entropy) were selected for further processing. These residues have a high conservation of a residue within a subfamily but a low conservation outside a subfamily, which is an indication of the subfamily‐specific relevance of the residue for, for example, ligand recognition or receptor function. For visualization of the results, a scatterplot is created displaying the internal versus the external entropy with all residue positions (each dot) colored according to their ss‐TEA score (Figure 4 C). Subsequently, an alignment of solely the selected 20 residues is generated and used to calculate the sequence identity of the human GPCR of the subfamily to all human GPCRs. The nearest 50 GPCRs based on this ss‐TEA sequence alignment are selected and shown in an interactive table viewer as potential candidates for ligand repurposing and complemented by a list of available crystal structures in the PDB. Moreover, all ChEMBL bioactivities for each receptor are obtained and the number of active inhibitors annotated in ChEMBL is listed, including the number of known ligands that have both an affinity for the identified receptor as well as for the reference receptor. For the SST5R this selection of GPCRs logically contains the other somatostatin receptors and the closely related opioid receptors, but also the more distant dopamine as well as serotonin receptors (Figure 4). This matches with the known cross‐reactivity of some SST5R inhibitors for the μ opioid receptor, as well as the dopamine D2 receptor (D2R) and the serotonin 2B receptor (5‐HT2BR), which are also identified by the cross‐reactivity assessment using the ChEMBL bioactivities of the known SST5R inhibitors (see Figure 4 C,D). This is, for example, demonstrated by the cross‐reactivity of the marketed drugs Fluspirilene (a D2R antagonist) and Loperamide (a μ opioid agonist) on SST5R. Vice versa, a series of benzoxazole SST5R inhibitors showed nanomolar affinities for 5‐HT2BR (Figure 4 F). All these receptors share the key ionic anchor D3.32 (Figure 4 E) within the selected residues, which was deemed essential for the ligand recognition.46

Structure‐based pharmacophore comparison for ligand repurposing across protein families

Ligand repurposing across protein families can be enabled through the comparison of known protein binding sites based on the available crystal structures.47 The rationale is that proteins with similar binding sites can potentially bind similar ligands.47, 48 In this workflow (Figure 6) we compare the KRIPO binding site pharmacophores from all structures of a protein (family) of interest against the KRIPO pharmacophores of the full PDB to identify ligand‐repurposing possibilities.

Figure 6.

Figure 6

A structure‐based ligand repurposing workflow (A) that searches for KripoDB26 pharmacophore similarities between GPCRs and kinases. Two examples (B) of binding site similarities between the 5‐HT2B receptor and MAPK14 kinase, and the adenosine A2A receptor and the TTK kinase are presented and described in the main text. The aligned kinase and GPCR structures based on the alignment of the KRIPO pharmacophores are shown in 3D using the Proteins and Ligands viewer (for clarity purposes the lipophilic pharmacophore features are hidden). Only residues within 3.5 Å of the ligands are depicted and labeled according to the Ballesteros–Weinstein45 and KLIFS24 numbering scheme for GPCRs and kinases, respectively. Complementary shape‐based and pharmacophore‐based assessment of the ligands using the KNIME‐enabled Silicos‐it25 tools Shape‐it and Align‐it are performed and compared in the Ligands viewer and Pharmacophore viewer, respectively.

Protocol:

  • Collect available PDB entries for the protein families of interest

  • Obtain the KRIPO fragments information based on the PDB entries of the reference protein family and search for similar KRIPO fragments in the PDB

  • Extract similar fragments that match with PDB entries from the query protein family

  • Select interesting fragment pairs and further explore them by, for example, KRIPO pharmacophore alignment and 3D similarity comparison.

With the GPCRdb KNIME nodes, an overview of all GPCR crystal structures49 is obtained and used to query the KripoDB26b for the available pharmacophore fragment information for these structures. For all full ligand KripoDB entries a similarity search is performed with the KripoDB similar fragments node. The results are then filtered using the KLIFS nodes with an overview of all kinase crystal structures yielding an overview of GPCR pharmacophore fragments that share similarity with a kinase pharmacophore fragment based on their KripoDB fingerprints. From this list, two examples were selected that identified a possible overlap between the KRIPO pharmacophores based on a kinase and a GPCR structure. The first example is the match between the Sorafenib‐bound MAPK14 protein kinase50 (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=3HEG) and the Ergotamine‐bound 5‐HT2B receptor51 (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=4IB4), consistent with studies showing that the FDA‐approved kinase inhibitor Sorafenib has nanomolar affinity for 5‐HT2BR.52 The second example is the match between Reversine‐bound TTK protein kinase53 (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=5LJJ) and the triazolecarboximidamide‐bound A2A receptor54 (PDB ID: http://www.rcsb.org/pdb/explore/explore.do?structureId=5UIG). Reversine shows weak binding affinity for the adenosine A2A receptor, and has sub‐micromolar affinity for the homologous adenosine A3 receptor.55

The KRIPO pharmacophores of each structure were downloaded and aligned using the KripoDB pharmacophore and Align Pharmacophores nodes, respectively. The rotational matrix obtained from the alignment was then used to align both pharmacophores as well as the complete PDB entries in the pharmacophore viewer. To compare the structure‐based pharmacophore alignment of the molecules with a ligand‐based approach both molecules were aligned using a ligand‐based pharmacophore approach (Align‐it) and a shape‐based approach (Shape‐it). The SMILES of both co‐crystallized ligands were obtained from the PDB using the PDB Connector Custom Report node. Then the RDkit Add Conformers node was used to generate 30 conformations for each ligand as input for the Align‐it and Shape‐it nodes. The ligand‐based alignments were again visualized with the Pharmacophores Viewer and the Ligands and Proteins viewer. Interestingly, the urea moiety of Sorafenib binding in the back pocket of MAPK14 is aligned with the basic amine in the fused tetracyclic head of Ergotamine. This ligand alignment originates from the KRIPO pharmacophore alignment as the negatively charged centers of the conserved glutamate (E71αC.24) in the αC‐helix of MAPK14 and the key aspartate D1353.32 of 5‐HT2BR are matched.

The volume‐based Shape‐it overlay shows a good overlap (Tanimoto score=0.67) between the two compounds, however, most pharmacophore features are not aligned due to a 180‐degree flip of the core scaffold to maximize the shape overlay. The ligand‐based pharmacophore overlay using Align‐it results in a poor score (Tanimoto score=0.22) and an alignment in which the whole molecules are flipped 180 degrees, illustrating that the structure‐based KRIPO pharmacophores were key for the elucidation of this off‐target effect.

Conclusions

The presented structural cheminformatics tools and integrated workflows combine heterogeneous data analyses that enable the prediction of protein–ligand interactions and the identification of protein–protein relations. The reusable workflows provide general guidelines that can be used for the construction of automated computer‐aided drug discovery protocols, or for the customization and extension to other targets and applications:

  1. The use of well documented and amenable workflow management platforms like KNIME facilitate the construction of consistent, reproducible,1 and transferable protocols.7c The workflows can be transferred between, for example, workstations, users, and sites, and can be re‐run: i) as is, for example, when large data transfer is not feasible, or when new database versions are released; ii) with different configurations of the nodes, for example, changing ligand activity cut‐offs (Figure 2), input ligands (Figures 3, 4, 6), protein targets (Figure 5); iii) with additional/modified nodes to obtain complementary information, for example, including annotations from other databases, further analyzing results, or performing machine learning56 on the obtained data. Pre‐configured meta nodes or workflow blocks can be easily reused because the same data collection, preparation, processing and analysis steps might be required in various workflows for different purposes.

  2. KNIME contains a rich and continuously growing set of cheminformatics nodes to handle and process chemical and biological data in multiple formats. Custom nodes can be developed, such as the nodes presented in the current study, and scripts and external tools can be embedded to extend the functionalities of this toolkit in order to address a plethora of biochemical research questions, for example, structural protein–ligand interaction analysis and prediction functionalities.

  3. Carefully annotated and standardized data resources are required to perform integrated cheminformatics analyses.2a, 30, 57 However, it should be noted that the use of external databases can also present a potential pitfall as they can change content and format thereby disrupting the workflow or changing the outcome.

  4. The infrastructure of a workflow management platform such as KNIME allows for interactive checks during execution of the workflow. Checking the input and output for each step during the development of a workflow makes for easy debugging resulting a more robust and less error‐prone workflow. To enhance this process customized data visualization nodes, such as the proteins and ligands viewer and the pharmacophore viewer nodes presented in the current study, are also required to inspect the validity of, for example, docking studies, pharmacophore‐based structure alignments, and binding mode similarity assessments.

  5. Combining complementary techniques within the same workflow allows for the creation of more advanced or more accurate (consensus)58 cheminformatics workflows, for example, by combining ligand‐based on protein–ligand interaction based similarity assessments59 or by combining 2D and 3D ligand‐based similarity60 methods.

Experimental Section

Newly developed KNIME nodes: The KNIME workflows described in this article use a series of 3D‐e‐Chem KNIME nodes that have been newly developed in addition to a set of previously published 3D‐e‐Chem nodes. An overview of the new nodes is shown in the list below and the nodes themselves are discussed in more detail in the next few paragraphs.

  • Pharmacophore: Retrieval of the KRIPO pharmacophore based on the KripoDB fragment identifier.

  • Ligands Viewer: visualization of (aligned) small molecules.

  • Ligands and Proteins Viewer: the combined visualization of (aligned) small molecules and proteins

  • Proteins Viewer: visualization of (aligned) proteins

  • Pharmacophores Viewer: visualization of (aligned) pharmacophores, small molecules and proteins

  • Align pharmacophores: align the query pharmacophores to the reference pharmacophore.

  • Extract pharmacophore points: extract the points of a pharmacophore as rows.

  • Merge pharmacophore points: create pharmacophores from a table with x, y, z coordinates, pharmacophore type, alpha and optional directionality.

  • Pharmacophore from molecule: create a pharmacophore from a molecule by mapping atoms to pharmacophore points.

  • Pharmacophore to molecule: generate a molecule from a pharmacophore by mapping pharmacophore points to atoms.

  • Pharmacophore reader: reads a pharmacophore file (*.phar) in the Silicos‐it phar file format.

  • Pharmacophore writer: writes a pharmacophore to a file (*.phar) in the Silicos‐it phar file format.

  • PLANTS bindingsite: calculates the binding site definition for docking based on a reference ligand or pocket atoms of the protein.

  • PLANTS session builder: takes the protein, binding site and ligands from KNIME and creates the docking session.

  • PLANTS virtual screening: runs the actual docking itself based on the session created by the session builder.

  • PLANTS virtual screening results reader: reads the docking results into a KNIME table.

  • Align‐it: aligns molecules to a reference molecule based on their pharmacophore features and scores the alignment.

  • Align‐it Pharmacophore generator: generates pharmacophores for molecules based on their pharmacophore features.

  • Filter‐it: filters a set of molecules based on molecular property ranges.

  • Filter‐it property calculator: calculates molecular properties for a given set of molecules.

  • Qed Calculator: performs a quantitative estimation of drug‐likeness (QED) of a set of given molecules. Requires qed.py Python package to be installed

  • Shape‐it: performs a shape‐based alignment and scoring of a set of ligands to a reference ligand.

  • Strip‐it: strips a given set of molecules to its scaffold based on a user‐selected scaffold definition.

  • Ss‐TEA score: calculates the ss‐TEA score for each residue position of a sequence alignment for a set of family members.

Most of the nodes are available under the permissive Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0). The PLANTS binaries for docking (embedded within the PLANTS nodes) are freely available for academics, and the Silicos‐it source is available under the GNU Lesser General Public License v3 (https://www.gnu.org/licenses/lgpl‐3.0.en.html). A more detailed overview per node set and tool, including license information, dependencies, and their application, is given in Supporting Information Table S1.

GPCRdb nodes: The GPCRdb23 is a specialized database focused on G protein‐coupled receptors: the largest protein family that lies encoded within the human genome. Besides a comprehensive ontology, this database contains information on GPCR sequences, alignments, residue numbering schemes, crystal structures, interactions, and mutation data. The eight GPCRdb KNIME nodes, as previously described,31 provide access to this information from within KNIME and enable the integration of this data in comprehensive chemogenomics workflows.

KLIFS nodes: KLIFS contains kinase‐ligand interaction information derived from over 3900 structures covering more than 270 different kinases in complex with ≈2500 unique ligands (accessed August 2017). All kinase structures within KLIFS are curated, annotated, aligned, and processed in a systematic manner with automated weekly updates. All KLIFS content can be accessed from within KNIME using one or more of the nine KLIFS nodes from four different categories, as published in McGuire et al.31

KripoDB nodes: The pairwise pharmacophore similarity of more than half a million (sub)pockets extracted from structures in the Protein Data Bank is available in the KripoDB. KRIPO encodes pocket pharmacophores into a fuzzy 3‐point pharmacophore fingerprints that are subsequently used to assess this similarity.26a Besides the “Fragment information” and the “Similar fragments” KRIPO nodes that were previously published,26a a new KripoDB KNIME node has been added for the retrieval of the pharmacophores themselves that where used for the creation of the KRIPO fingerprints. This allows a user to obtain the pharmacophore of interest, and to align and visualize it in combination with the new set of “Pharmacophore” nodes as well as the “Pharmacophores Viewer”.

Molviewer nodes: The freely available molecule viewers in KNIME are primarily oriented at visualization of small molecules. To enable displaying proteins, protein–ligand complexes, and pharmacophores in KNIME we created a set of visualization nodes. When opening a KNIME view of one of the new viewer nodes a web browser will be opened with an interactive 3D canvas portraying the input molecule(s). There are four molecule viewer KNIME nodes: one to view a set of (aligned) small molecules (e.g., shape‐it results), one to view a set of (aligned) small molecules and proteins (e.g., for visualizing PLANTS docking results), one to view a set of (aligned) proteins (e.g., obtained from KLIFS), and one to view a set of pharmacophores and their aligned protein and/or ligands (e.g., from aligning KripoDB pharmacophores). The molecule viewer KNIME nodes supports HiLiting, which means that a selection of molecules inside the viewer can be sent to other KNIME nodes and vice versa. The web‐based molecule viewers use the NGL protein viewer61 (https://github.com/arose/ngl) as its 3D canvas and use React, Redux, and Bootstrap for controls. The KNIME nodes are written in Java. The web application files are hosted by a Jetty‐based webserver and the Jersey‐based web service, which are both embedded inside the nodes.

Pharmacophore nodes: The pharmacophores nodes are a set of KNIME nodes that enable the conversion and alignment of pharmacophores. The nodes support (directed) pharmacophore features with the following supported types: aromatic, H‐bond donor, H‐bond acceptor, lipophilic, positively charged, negatively charged and exclusion. The pharmacophores nodes comprise nodes that read and write pharmacophores in the Silicos‐it phar file format, nodes to convert a pharmacophore from or to a molecule by mapping the pharmacophore types from or to elements, nodes that convert 3D points with a type information into a pharmacophore and vice versa, and finally there is a node to align pharmacophore(s) to a reference pharmacophore. The pharmacophore alignment is performed by comparing all the point pair combinations the pharmacophores can have in common and then identifies the maximum point pair combinations using Bron–Kerbosch62 clique detection algorithm. It subsequently uses the Kabsch63 algorithm to compute the optimal translation and rotation matrices using singular value decomposition, which are then applied to the probe pharmacophores to get the aligned probe pharmacophores for each point pair combination. The pharmacophore KNIME nodes are written in Java and depend on the ejml Java library (http://ejml.org/) for matrix operations. The alignment algorithm is based on the KRIPO26 codebase.

PLANTS: PLANTS27 is a free‐for‐academics docking tool that employs an ant‐colony optimization algorithm for sampling potential ligand binding modes and uses a semi‐empirical scoring function. The PLANTS KNIME nodes are: i) binding site node to calculate the binding site definition based on the ligand molecule or pocket atoms of the protein, ii) configuration reader to read PLANTS definition files which are used for configuration and to determine the docking output file names, iii) configuration generator to generate a PLANTS config file using the nodes dialog with almost all PLANTS configuration fields divided into tabs, iv) runner, the node that executes the PLANTS executable, v) session builder, which takes the protein, binding site, and ligands from KNIME as input and writes them in a session directory as files as input for the PLANT executable, vi) virtual screening runs the PLANTS executable in screen mode and will read the files written by the session builder, and finally vii) the virtual screening results reader which reads the output files generated by the virtual screening node into KNIME. The PLANTS runner and PLANTS configuration generator KNIME nodes are written in Java and use the Mustache template library64 to write the PLANTS config file. All the other PLANTS nodes are implemented as KNIME meta nodes. A PLANTS executable for Windows, Linux and Mac OS X is bundled with the PLANTS KNIME nodes and is provided under a free academic license. The location of the PLANTS executable defaults to the bundled version, but can be overwritten in the KNIME preferences. The initialization and combination of PLANTS KNIME nodes for docking runs requires great care. Therefore, an example docking workflow has been made available at https://github.com/3D-e-Chem/knime-plants/blob/master/examples/plants-virtual-screening-example.knwf.

Silicos‐it nodes: Silicos‐it25 released several of their cheminformatics tools to the open source domain. These KNIME nodes bring their functionality to the KNIME environment. The nodes are: i) align‐it,65 which aligns molecules to a reference molecule based on their pharmacophore, ii) shape‐it,65b,65c, 66 which aligns molecules to a reference molecule based on their shape, iii) filter‐it,67 which can filter molecules with undesired properties from a compound set, iv) strip‐it, which generates the Murcko,68 Oprea,69 or Schuffenhauer70 scaffold of a molecule v) Qed,71 which calculates the Quantitative Estimation of Drug‐likeness (QED) for a (set of) molecule(s). The Silicos‐it executables are written in C++ and have OpenBabel as a dependency to read and write different molecule formats. The KNIME Silicos‐it nodes come bundled with the align‐it, filter‐it, shape‐it, strip‐it executables for Linux and Mac OS X. The location of the executable defaults to the bundled versions, but can be overwritten in the KNIME preferences. All the Silicos‐it KNIME nodes are implemented as KNIME meta nodes, except for the node that executes the actual Silicos‐it executables. The silicos‐it execute node is implemented in Java and is used by all meta nodes. The align‐it executable is wrapped into two KNIME nodes. A node to align SDF formatted molecules to a reference molecule and another node to generate pharmacophores from molecules. The align‐it KNIME nodes are part of the Silicos‐it KNIME nodes plugin. The shape‐it executable aligns molecules to a reference molecule based on their shape. The shape‐it executable is wrapped in a KNIME node, which aligns SDF formatted molecule to a reference molecule. The output of the node has the aligned molecules and alignment scores.

ss‐TEA: The ss‐TEA score28 is an abbreviation for subfamily‐specific Two Entropy Analysis score. The score is calculated for each residue position of a large sequence alignment based on a comparison of the level of conservation within a subset (i.e., a subfamily) of proteins (internal entropy) compared to all other proteins (external entropy). By identifying positions that are highly conserved within, but not outside of the subfamily, the ss‐TEA score can identify residue positions specifically related to ligand binding or protein function for that specific subset. This methodology is, however, dependent on a high quality and large quantity sequence alignment as input. The ss‐TEA algorithm has been implemented as a KNIME node, is written completely in Java and has no dependencies. The node requires a sequence alignment and a list of sequence identifiers, which will be used as the subfamily.

Workflows: All KNIME workflows described in this article, including the source code for all 3D‐e‐Chem nodes, are available from the 3D‐e‐Chem GitHub repository (https://github.com/3D-e-Chem/workflows). The individual steps of each workflow are described in more detail in the main text. All 3D‐e‐Chem nodes used to perform the analyses described in the current work are available under community contributions in KNIME under “3D‐e‐Chem” (https://www.knime.com/3d-e-chem-nodes-for-knime).

Conflict of interest

The authors declare no conflict of interest.

Supporting information

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supplementary

Acknowledgements

The authors would like to thank Thomas E. Exner for discussion on the PLANTS nodes development, David E. Gloriam and Christian Munk for discussion on the GPCRdb nodes development, and Silicos‐IT for making their tools open source. This research was supported financially by the Netherlands eScience Center (NLeSC)/NWO (Enabling Technologies project: 3D‐e‐Chem, grant 027.014.201) to C.d.G.. A.J.K., M.V., R.L., I.J.P.d.E., and C.d.G. participate in the GPCR Consortium (https://gpcrconsortium.org).

A. J. Kooistra, M. Vass, R. McGuire, R. Leurs, I. J. P. de Esch, G. Vriend, S. Verhoeven, C. de Graaf, ChemMedChem 2018, 13, 614.

Contributor Information

Stefan Verhoeven, Email: S.Verhoeven@esciencecenter.nl.

Dr. Chris de Graaf, Email: c.de.graaf@vu.nl.

References

  • 1. Fillbrunn A., Dietz C., Pfeuffer J., Rahn R., Landrum G. A., Berthold M. R., J. Biotechnol. 2017, 261, 149. [DOI] [PubMed] [Google Scholar]
  • 2. 
  • 2a. Lusher S. J., McGuire R., van Schaik R. C., Nicholson C. D., de Vlieg J., Drug Discovery Today 2014, 19, 859–868; [DOI] [PubMed] [Google Scholar]
  • 2b. Bajorath J., Overington J., Jenkins J. L., Walters P., Future Med. Chem. 2016, 8, 1807–1813. [DOI] [PubMed] [Google Scholar]
  • 3. Bento A. P., Gaulton A., Hersey A., Bellis L. J., Chambers J., Davies M., Kruger F. A., Light Y., Mak L., McGlinchey S., Nowotka M., Papadatos G., Santos R., Overington J. P., Nucleic Acids Res. 2014, 42, D1083–D1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N., Bourne P. E., Nucleic Acids Res. 2000, 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.The UniProt Consortium, Nucleic Acids Res 2017, 45, D158–D169. [DOI] [PMC free article] [PubMed]
  • 6. 
  • 6a. Vass M., Kooistra A. J., Ritschel T., Leurs R., de Esch I. J. P., de Graaf C., Curr. Opin. Pharmacol. 2016, 30, 59–68; [DOI] [PubMed] [Google Scholar]
  • 6b. Rognan D., Mol. Inf. 2013, 32, 1029–1034. [DOI] [PubMed] [Google Scholar]
  • 7. 
  • 7a. Bajorath J., Mol. Inf. 2013, 32, 1025–1028; [DOI] [PubMed] [Google Scholar]
  • 7b. Brown J. B., Okuno Y., Marcou G., Varnek A., Horvath D., J. Comput.-Aided Mol. Des. 2014, 28, 597–618; [DOI] [PubMed] [Google Scholar]
  • 7c.E. Jacoby, B. Wroblowski, C. Buyck, J. M. Neefs, C. Meyer, M. D. Cummings, H. van Vlijmen, Mol. Inf 2017 DOI: https://doi.org/10.1002/minf.201700119. [DOI] [PubMed]
  • 8. Warr W. A., J. Comput.-Aided Mol. Des. 2012, 26, 801–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. González-Medina M., Naveja J. J., Sánchez-Cruz N., Medina-Franco J. L., RSC Adv. 2017, 7, 54153–54163. [Google Scholar]
  • 10. 
  • 10a. Berthold M. R., Cebron N., Dill F., Gabriel T. R., Kötter T., Meinl T., Ohl P., Sieb C., Thiel K., Wiswedel B. in Data Analysis, Machine Learning and Applications, Springer, Heidelberg: 2007, pp. 319–326; [Google Scholar]
  • 10b.KNIME, https://www.knime.com.
  • 11.BIOVIA, Pipeline Pilot, San Diego: Dassault Systèmes, 2017.
  • 12. 
  • 12a. Steinmetz F. P., Mellor C. L., Meinl T., Cronin M. T., Mol. Inf. 2015, 34, 171–178; [DOI] [PubMed] [Google Scholar]
  • 12b.G. Nicola, M. R. Berthold, M. P. Hedrick, M. K. Gilson, Database 2015 DOI: https://doi.org/10.1093/database/bav087; [DOI] [PMC free article] [PubMed]
  • 12c. Digles D., Zdrazil B., Neefs J. M., Van Vlijmen H., Herhaus C., Caracoti A., Brea J., Roibas B., Loza M. I., Queralt-Rosinach N., Furlong L. I., Gaulton A., Bartek L., Senger S., Chichester C., Engkvist O., Evelo C. T., Franklin N. I., Marren D., Ecker G. F., Jacoby E., MedChemComm 2016, 7, 1237–1244; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12d. Tzanetou E., Liekens S., Kasiotis K. M., Melagraki G., Afantitis A., Fokialakis N., Haroutounian S. A., Eur. J. Med. Chem. 2014, 81, 139–149; [DOI] [PubMed] [Google Scholar]
  • 12e. Riniker S., Landrum G. A., Montanari F., Villalba S. D., Maier J., Jansen J. M., Walters W. P., Shelat A. A., F1000Res 2017, 6, 1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Steindl T. M., Schuster D., Laggner C., Chuang K., Hoffmann R. D., Langer T., J. Chem. Inf. Model. 2007, 47, 563–571. [DOI] [PubMed] [Google Scholar]
  • 14. Pearce B. C., Langley D. R., Kang J., Huang H., Kulkarni A., J. Chem. Inf. Model. 2009, 49, 1797–1809. [DOI] [PubMed] [Google Scholar]
  • 15. 
  • 15a. Meslamani J., Bhajun R., Martz F., Rognan D., J. Chem. Inf. Model. 2013, 53, 2322–2333; [DOI] [PubMed] [Google Scholar]
  • 15b. Mukherjee P., Bentzien J., Bosanac T., Mao W., Burke M., Muegge I., J. Chem. Inf. Model. 2017, 57, 2152–2160. [DOI] [PubMed] [Google Scholar]
  • 16. 
  • 16a. Corso G., Alisi M. A., Cazzolla N., Coletta I., Furlotti G., Garofalo B., Mangano G., Mancini F., Vitiello M., Ombrato R., Mol. Inf. 2016, 35, 358–368; [DOI] [PubMed] [Google Scholar]
  • 16b.J. M. Gally, S. Bourg, Q. T. Do, S. Aci-Seche, P. Bonnet, Mol. Inf 2017, 36, DOI: https://doi.org/10.1002/minf.201700023; [DOI] [PubMed]
  • 16c. Hall M. L., Jorgensen W. L., Whitehead L., J. Chem. Inf. Model. 2013, 53, 907–922. [DOI] [PubMed] [Google Scholar]
  • 17. Willighagen E. L., Mayfield J. W., Alvarsson J., Berg A., Carlsson L., Jeliazkova N., Kuhn S., Pluskal T., Rojas-Cherto M., Spjuth O., Torrance G., Evelo C. T., Guha R., Steinbeck C., J. Cheminf. 2017, 9, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. 
  • 18a. O'Boyle N. M., Guha R., Willighagen E. L., Adams S. E., Alvarsson J., Bradley J. C., Filippov I. V., Hanson R. M., Hanwell M. D., Hutchison G. R., James C. A., Jeliazkova N., Lang A. S., Langner K. M., Lonie D. C., Lowe D. M., Pansanel J., Pavlov D., Spjuth O., Steinbeck C., Tenderholt A. L., Theisen K. J., Murray-Rust P., J. Cheminf. 2011, 3, 37; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18b. Mazanetz M. P., Marmon R. J., Reisser C. B. T., Morao I., Curr. Top. Med. Chem. 2012, 12, 1965–1979. [DOI] [PubMed] [Google Scholar]
  • 19. Beisken S., Meinl T., Wiswedel B., de Figueiredo L. F., Berthold M., Steinbeck C., BMC Bioinf. 2013, 14, 257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.ChemAxon Ltd., Budapest, Hungary, https://chemaxon.com.
  • 21. O'Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R., J. Cheminf. 2011, 3, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.G. Landrum, RDKit: Open-source cheminformatics, http://www.rdkit.org.
  • 23. Isberg V., Mordalski S., Munk C., Rataj K., Harpsoe K., Hauser A. S., Vroling B., Bojarski A. J., Vriend G., Gloriam D. E., Nucleic Acids Res. 2016, 44, D356–D364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. 
  • 24a. van Linden O. P. J., Kooistra A. J., Leurs R., de Esch I. J. P., de Graaf C., J. Med. Chem. 2014, 57, 249–277; [DOI] [PubMed] [Google Scholar]
  • 24b. Kooistra A. J., Kanev G. K., van Linden O. P., Leurs R., de Esch I. J. P., de Graaf C., Nucleic Acids Res. 2016, 44, D365–D371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Silicos-it, Open source tools: Align-it, Filter-it, Shape-it, Strip-it, Qed, Silicos-it, Wijnegem, http://silicos-it.be.s3-website-eu-west-1.amazonaws.com/.
  • 26. 
  • 26a. Wood D. J., de Vlieg J., Wagener M., Ritschel T., J. Chem. Inf. Model. 2012, 52, 2031–2043; [DOI] [PubMed] [Google Scholar]
  • 26b. Ritschel T., Schirris T. J., Russel F. G., J. Cheminf. 2014, 6, O26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. 
  • 27a. Korb O., Stützle T., Exner T. E., J. Chem. Inf. Model. 2009, 49, 84–96; [DOI] [PubMed] [Google Scholar]
  • 27b. Korb O., Stützle T., Exner T. E., Proc. IEEE Swarm Intell. Symp. 2007, 1, 115–134. [Google Scholar]
  • 28. Sanders M. P., Fleuren W. W., Verhoeven S., Beld S. van den, Alkema W., de Vlieg J., Klomp J. P., BMC Bioinf. 2011, 12, 332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Pandy-Szekeres G., Munk C., Tsonkov T. M., Mordalski S., Harpsoe K., Hauser A. S., Bojarski A. J., Gloriam D. E., Nucleic Acids Res. 2018, 46, D440–D446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kooistra A. J., Volkamer A. in Annual Reports in Medicinal Chemistry, Vol. 50 (Ed.: R. A. Goodnow), Academic Press, Amsterdam, 2017, pp. 197–236. [Google Scholar]
  • 31. McGuire R., Verhoeven S., Vass M., Vriend G., de Esch I. J., Lusher S. J., Leurs R., Ridder L., Kooistra A. J., Ritschel T., de Graaf C., J. Chem. Inf. Model. 2017, 57, 115–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Eid S., Turk S., Volkamer A., Rippmann F., Fulle S., BMC Bioinf. 2017, 18, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Böhm H. J., Flohr A., Stahl M., Drug Discovery Today Technol. 2004, 1, 217–224. [DOI] [PubMed] [Google Scholar]
  • 34. Marcou G., Rognan D., J. Chem. Inf. Model. 2007, 47, 195–207. [DOI] [PubMed] [Google Scholar]
  • 35. Lee S. C., Kim H. T., Park C. H., Lee D. Y., Chang H. J., Park S., Cho J. M., Ro S., Suh Y. G., Bioorg. Med. Chem. Lett. 2012, 22, 4221–4224. [DOI] [PubMed] [Google Scholar]
  • 36. Sivaprakasam P., Han X., Civiello R. L., Jacutin-Porte S., Kish K., Pokross M., Lewis H. A., Ahmed N., Szapiel N., Newitt J. A., Baldwin E. T., Xiao H., Krause C. M., Park H., Nophsker M., Lippy J. S., Burton C. R., Langley D. R., Macor J. E., Dubowchik G. M., Bioorg. Med. Chem. Lett. 2015, 25, 1856–1863. [DOI] [PubMed] [Google Scholar]
  • 37. Rogers D., Hahn M., J. Chem. Inf. Model. 2010, 50, 742–754. [DOI] [PubMed] [Google Scholar]
  • 38.R. Fonseca, A. Venkatakrishnan, Flareplot, https://gpcrviz.github.io/flareplot/.
  • 39. 
  • 39a. Keiser M. J., Roth B. L., Armbruster B. N., Ernsberger P., Irwin J. J., Shoichet B. K., Nat. Biotechnol. 2007, 25, 197–206; [DOI] [PubMed] [Google Scholar]
  • 39b. Keiser M. J., Setola V., Irwin J. J., Laggner C., Abbas A. I., Hufeisen S. J., Jensen N. H., Kuijer M. B., Matos R. C., Tran T. B., Whaley R., Glennon R. A., Hert J., Thomas K. L., Edwards D. D., Shoichet B. K., Roth B. L., Nature 2009, 462, 175–181; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39c. Czodrowski P., Bolick W. G., J. Chem. Inf. Model. 2016, 56, 2013–2023; [DOI] [PubMed] [Google Scholar]
  • 39d. Irwin J. J., Gaskins G., Sterling T., Mysinger M. M., Keiser M. J., J. Chem. Inf. Model. 2018, 58, 148–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. 
  • 40a. Riniker S., Landrum G. A., J. Cheminf. 2013, 5, 26; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40b. Bender A., Expert Opin. Drug Discovery 2010, 5, 1141–1151. [DOI] [PubMed] [Google Scholar]
  • 41. Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B., Ideker T., Genome Res. 2003, 13, 2498–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Quillan J. M., Sadée W., Pharm. Res. 1996, 13, 1624–1630. [DOI] [PubMed] [Google Scholar]
  • 43. Desaphy J., Bret G., Rognan D., Kellenberger E., Nucleic Acids Res. 2015, 43, D399–D404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. 
  • 44a. Oprea T. I., Bauman J. E., Bologa C. G., Buranda T., Chigaev A., Edwards B. S., Jarvik J. W., Gresham H. D., Haynes M. K., Hjelle B., Hromas R., Hudson L., Mackenzie D. A., Muller C. Y., Reed J. C., Simons P. C., Smagley Y., Strouse J., Surviladze Z., Thompson T., Ursu O., Waller A., Wandinger-Ness A., Winter S. S., Wu Y., Young S. M., Larson R. S., Willman C., Sklar L. A., Drug Discovery Today 2011, 8, 61–69; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44b. Pollastri M. P., Campbell R. K., Future Med. Chem. 2011, 3, 1307–1315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Ballesteros J. A., Weinstein H., Methods Neurosci. 1995, 25, 366–428. [Google Scholar]
  • 46. Kooistra A. J., Kuhne S., Esch I. J. P., Leurs R., Graaf C., Br. J. Pharmacol. 2013, 170, 101–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Rognan D., Mol. Inf. 2010, 29, 176–187. [DOI] [PubMed] [Google Scholar]
  • 48. De Franchi E., Schalon C., Messa M., Onofri F., Benfenati F., Rognan D., PLoS One 2010, 5, e12214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Tautermann C. S., Bioorg. Med. Chem. Lett. 2014, 24, 4073–4079. [DOI] [PubMed] [Google Scholar]
  • 50. Namboodiri H. V., Bukhtiyarova M., Ramcharan J., Karpusas M., Lee Y., Springman E. B., Biochemistry 2010, 49, 3611–3618. [DOI] [PubMed] [Google Scholar]
  • 51. Wacker D., Wang C., Katritch V., Han G. W., Huang X. P., Vardy E., McCorvy J. D., Jiang Y., Chu M., Siu F. Y., Liu W., Xu H. E., Cherezov V., Roth B. L., Stevens R. C., Science 2013, 340, 615–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Lin X., Huang X. P., Chen G., Whaley R., Peng S., Wang Y., Zhang G., Wang S. X., Wang S., Roth B. L., Huang N., J. Med. Chem. 2012, 55, 5749–5759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Hiruma Y., Koch A., Dharadhar S., Joosten R. P., Perrakis A., Proteins Struct. Funct. Bioinf. 2016, 84, 1761–1766. [DOI] [PubMed] [Google Scholar]
  • 54. Sun B., Bachhawat P., Chu M. L., Wood M., Ceska T., Sands Z. A., Mercier J., Lebon F., Kobilka T. S., Kobilka B. K., Proc. Natl. Acad. Sci. USA 2017, 114, 2066–2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Perreira M., Jiang J. K., Klutz A. M., Gao Z. G., Shainberg A., Lu C., Thomas C. J., Jacobson K. A., J. Med. Chem. 2005, 48, 4910–4918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Martin E. J., Polyakov V. R., Tian L., Perez R. C., J. Chem. Inf. Model. 2017, 57, 2077–2088. [DOI] [PubMed] [Google Scholar]
  • 57. 
  • 57a. Kramer C., Kalliokoski T., Gedeck P., Vulpetti A., J. Med. Chem. 2012, 55, 5165–5173; [DOI] [PubMed] [Google Scholar]
  • 57b. Goldmann D., Montanari F., Richter L., Zdrazil B., Ecker G. F., Future Med. Chem. 2014, 6, 503–514; [DOI] [PubMed] [Google Scholar]
  • 57c. Hu Y., Bajorath J., Drug Discovery Today 2014, 19, 357–360. [DOI] [PubMed] [Google Scholar]
  • 58. Schultes S., Kooistra A. J., Vischer H. F., Nijmeijer S., Haaksma E. E., Leurs R., de Esch I. J. P., de Graaf C., J. Chem. Inf. Model. 2015, 55, 1030–1044. [DOI] [PubMed] [Google Scholar]
  • 59. 
  • 59a. Svensson F., Karlen A., Skold C., J. Chem. Inf. Model. 2012, 52, 225–232; [DOI] [PubMed] [Google Scholar]
  • 59b. Kooistra A. J., Vischer H. F., McNaught-Flores D., Leurs R., de Esch I. J. P., de Graaf C., Sci. Rep. 2016, 6, 28288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Zhang Q., Muegge I., J. Med. Chem. 2006, 49, 1536–1548. [DOI] [PubMed] [Google Scholar]
  • 61. Rose A. S., Hildebrand P. W., Nucleic Acids Res. 2015, 43, W576–W579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Bron C., Kerbosch J., Commun. ACM 1973, 16, 575–577. [Google Scholar]
  • 63. Kabsch W., Acta Crystallogr. Sect. A 1976, 32, 922–923. [Google Scholar]
  • 64.S. Pullara, Mustache, 2016.
  • 65. 
  • 65a. Taminau J., Thijs G., De Winter H., J. Mol. Graphics Modell. 2008, 27, 161–169; [DOI] [PubMed] [Google Scholar]
  • 65b. Grant J. A., Gallardo M., Pickup B. T., J. Comput. Chem. 1996, 17, 1653–1666; [Google Scholar]
  • 65c. Greene J., Kahn S., Savoj H., Sprague P., Teig S., J. Chem. Inf. Comput. Sci. 1994, 34, 1297–1308. [Google Scholar]
  • 66. 
  • 66a. Karney C. F., J. Mol. Graphics Modell. 2007, 25, 595–604; [DOI] [PubMed] [Google Scholar]
  • 66b.A. A. Hasan, M. A. Hasan, “Constrained gradient descent and line search for solving optimization problem with elliptic constraints”, in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP′03), Proceedings, Vol. 2, IEEE, 2003, pp. II-793.
  • 67. 
  • 67a. Ertl P., Rohde B., Selzer P., J. Med. Chem. 2000, 43, 3714–3717; [DOI] [PubMed] [Google Scholar]
  • 67b. Andrews P. R., Craik D. J., Martin J. L., J. Med. Chem. 1984, 27, 1648–1657; [DOI] [PubMed] [Google Scholar]
  • 67c. Gleeson M. P., Hersey A., Montanari D., Overington J., Nat. Rev. Drug Discovery 2011, 10, 197–208; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67d. Egan W. J., K. M. Merz, Jr. , Baldwin J. J., J. Med. Chem. 2000, 43, 3867–3877; [DOI] [PubMed] [Google Scholar]
  • 67e. Lipinski C. A., Lombardo F., Dominy B. W., Feeney P. J., Adv. Drug Delivery Rev. 2001, 46, 3–26. [DOI] [PubMed] [Google Scholar]
  • 68. Bemis G. W., Murcko M. A., J. Med. Chem. 1996, 39, 2887–2893. [DOI] [PubMed] [Google Scholar]
  • 69. Pollock S. N., Coutsias E. A., Wester M. J., Oprea T. I., J. Chem. Inf. Model. 2008, 48, 1304–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Schuffenhauer A., Ertl P., Roggo S., Wetzel S., Koch M. A., Waldmann H., J. Chem. Inf. Model. 2007, 47, 47–58. [DOI] [PubMed] [Google Scholar]
  • 71. 
  • 71a. Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L., Nat. Chem. 2012, 4, 90–98; [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71b. Wildman S. A., Crippen G. M., J. Chem. Inf. Comput. Sci. 1999, 39, 868–873. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Supplementary


Articles from Chemmedchem are provided here courtesy of Wiley

RESOURCES