Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2023 Jun 23;63(13):4070–4078. doi: 10.1021/acs.jcim.3c00082

Discovery of a Novel DCAF1 Ligand Using a Drug–Target Interaction Prediction Model: Generalizing Machine Learning to New Drug Targets

Serah W Kimani †,, Julie Owen , Stuart R Green , Fengling Li , Yanjun Li , Aiping Dong , Peter J Brown , Suzanne Ackloo , David Kuter , Cindy Yang , Miranda MacAskill , Stephen Scott MacKinnon , Cheryl H Arrowsmith †,‡,, Matthieu Schapira †,§,*, Vijay Shahani ⊥,*, Levon Halabelian †,§,*
PMCID: PMC10337664  PMID: 37350740

Abstract

graphic file with name ci3c00082_0006.jpg

DCAF1 functions as a substrate recruitment subunit for the RING-type CRL4DCAF1 and the HECT family EDVPDCAF1 E3 ubiquitin ligases. The WDR domain of DCAF1 serves as a binding platform for substrate proteins and is also targeted by HIV and SIV lentiviral adaptors to induce the ubiquitination and proteasomal degradation of antiviral host factors. It is therefore attractive both as a potential therapeutic target for the development of chemical inhibitors and as an E3 ligase that could be recruited by novel PROTACs for targeted protein degradation. In this study, we used a proteome-scale drug–target interaction prediction model, MatchMaker, combined with cheminformatics filtering and docking to identify ligands for the DCAF1 WDR domain. Biophysical screening and X-ray crystallographic studies of the predicted binders confirmed a selective ligand occupying the central cavity of the WDR domain. This study shows that artificial intelligence-enabled virtual screening methods can successfully be applied in the absence of previously known ligands.

Introduction

The modern science of making drugs has diverged into many different subdisciplines since its humble beginnings in the 19th century. At its inception, the main focus of drug discovery was isolating and understanding the effects of natural substances. Since Langley and Ehrlich proposed receptor theory,1,2 drug development has more frequently relied on rational drug design methods. In this approach, a single protein pocket is selected as the receptor, and drug candidates are designed to fit in a complementary manner to that receptor as predicted by the lock-and-key model of protein binding. In the past two decades, the application of artificial intelligence (AI) in drug design has gained popularity, largely due to an increase in available datasets and improved computational processing power. However, the lock-and-key model still prevails as the guiding principle for drug designers.

The limitations of structure-based drug design (SBDD) and ligand-based drug design (LBDD) have been reviewed previously, prompting the search for new predictive solutions and consequently leading to the recent emergence of drug–target interaction (DTI) prediction models.3 Relative to SBDD techniques, DTI models inherit several of the key functional advantages of their LBDD counterparts, including the opportunity to continuously learn from observed outcomes and computational efficiency, such that models can enable much larger chemical library screens or counter screening strategies. However, DTI models further extend beyond standard LBDD modeling approaches by integrating various protein representations as a means to generalize training and inference across protein targets (Figure 1). In this study, we demonstrate the application of a DTI-based machine learning model MatchMaker, in the discovery of a small-molecule compound binding the WD40-repeat (WDR) domain of human DDB1-Cul4 associated factor 1 (DCAF1), a previously ligand-less target.

Figure 1.

Figure 1

Conceptual diagram highlighting differences between ligand-based drug discovery (LBDD) models and drug–target interaction (DTI) models. LBDD models treat each protein as its own machine learning model, thereby limiting inference (prediction) to targets that already have sufficient data to train models. DTI models train a global model to predict binding drug–target pairs, such that protein targets learn from the bioactivities of similar proteins.

DCAF1 is a 1507-amino acid protein composed of several domains and motifs (Figure S1), most of which are involved in protein–protein interaction events. The WDR domain that is the subject of this study is located on the C-terminal region of the DCAF1 protein (residues 1081–1388) and contains seven WD40 repeats, each spanning 40–60 amino acid residues and folding into four antiparallel beta strands “blades” that assemble into a seven-bladed β-propeller (doughnut-shaped) structure4 (Figure S1).

DCAF1 is mainly involved in the ubiquitin proteasome-mediated protein degradation pathway and therefore plays an important role in cellular homeostasis. DCAF1 is the substrate recognition subunit for two E3 ubiquitin ligases, the RING-type Cullin RING ligase 4 (CRL4) complex (with which it has been extensively characterized)5,6 and the HECT-type EDVP (UBR5/DYRK2) E3 ligase complex first identified by Maddika and Chen.7 In both cases, DCAF1 interacts with the adaptor protein DNA damage-binding protein 1 (DDB1) via its WDR and helix–loop–helix (HLH) H-box modules8,9 (see domains in Figure S1). In this configuration, the WDR domain provides a planar solvent-exposed structure that can bind substrate proteins on the top surfaces, the sides, and inside the central channel of the doughnut ring.4 None of the identified substrates are shared between the CRL4DCAF1 and EDVPDCAF1 E3 ligase complexes, highlighting DCAF1 as a unique protein that can service two distinct E3 ubiquitin ligases,10 which makes it an attractive target for the development of targeted protein degraders including proteolysis targeting chimeras (PROTACS).11

A common strategy used by pathogenic viruses to override host protective cellular processes is through hijacking the function of E3 ubiquitin ligases.6 Indeed, DCAF1 is targeted by the primate lentiviral (HIV and SIV) Vpr and Vpx accessory proteins,12 which bind to its WDR domain and recruit hosts’ protective proteins for ubiquitination by the CRL4DCAF1 complex and subsequent proteasomal degradation. The paralogous lentiviral accessory proteins have been shown to play distinct roles, with Vpr manipulating targets to cause a G2 phase cell cycle arrest, thus allowing viral propagation, and the Vpx acting to overcome host restriction factors, thus enabling viral infectivity.10 Vpr has also been shown to hijack the EDVPDCAF1 complex to disrupt centrosome homeostasis, thus contributing to HIV pathogenesis.13 DCAF1-associated E3 ligases are therefore attractive targets for development of protein–protein interaction inhibitors that can disrupt DCAF1-Vpr/x interactions.

DCAF1 is involved in regulating a variety of normal physiological processes including cell proliferation and survival, cell cycle progression, DNA replication, DNA damage responses, and microRNA biogenesis, among others (reviewed by Nakagawa et al. and Schabla et al.).10,14 DCAF1 impacts these processes as a standalone protein or in the E3 ligase complexes, regulating respective target proteins on the transcriptional and/or the protein levels. Importantly, owing to its role in cell proliferation and survival, deregulation of DCAF1 has been shown to promote tumorigenesis, with reduced levels of tumor suppressor genes like p53 and its target genes being a common mechanism directly linked to DCAF1’s roles in protein ubiquitination and gene silencing.15,16 Depletion of DCAF1 in some cancer cell lines has been shown to increase expression of tumor suppressor proteins.15,16

To enable experimental investigation of DCAF1 as a therapeutic or PROTAC target, we used AI to identify a DCAF1 ligand and confirmed its binding pose crystallographically. The reversible binding mode of this compound is complementary to the covalent engagement of DCAF1 ligands and PROTAC recently reported.17

Experimental Methods

Computational Screening

Compound Library and Procurement

The molecules described in this study were selected from the Enamine Screening Collection compound library, which was accessed from Enamine’s website on June 15, 2020 (https://enamine.net/compound-collections/screening-collection). The library contained 2,878,172 off-the-shelf molecules. All compounds selected for testing in the hit identification stage of the study were procured directly from Enamine (Kyiv, Ukraine).

Human Proteome and DCAF1 Pockets

Each MatchMaker model is released with a reference human proteome that enumerates plausible ligand binding sites on experimental and predicted protein structures. The 2020Q2 reference proteome used in this study contains 29,293 pockets from 8525 PDB and SwissModel18 cocomplex structures and has been previously described in other studies.19,20 At the time of this study, DCAF1 had no known ligand, so pockets were added to the proteome by structure-based predictions. Multiple plausible DCAF1 pockets were detected using P2Rank v2.1,21 operating on the DCAF1-SAMHD1-Vpx cocomplex 3D structure (PDB code 4CC9).22 Three pockets were chosen based on the P2Rank predictions as well as the DCAF1/Vpx interface from the cocomplex structure (PDB 4CC9)22 and named accordingly. P2Rank provided single Cartesian coordinates representing the center of each pocket in the reference frame of the 3D structure. These three pockets were used as the primary design objective for the computational hit discovery workflow (Figure 2).

Figure 2.

Figure 2

Pocket selection. (A) Visualization of P2Rank predicted pockets. (B,C) Cartoon representation of Vpx from the ternary complex of DCAF1-SAMHD1-Vpx (PDB code 4CC9) showing overlay of pocket prediction with protein contacts. (D) Inside pocket. (E) Top pocket. (F) Side pocket.

Modeling Drug–Target Interactions

Drug–target interactions were modeled using Cyclica’s 2020Q2 release MatchMaker model.19 MatchMaker is a neural network trained to discriminate bioactive drug–target pairs from randomized pairs.3 The network’s input layer concatenates ligand and protein representations, where ligands are represented as a combination of molecular descriptors and fingerprints and proteins are represented as functional annotations retrieved from Uniprot,23 as well as structural descriptors of the target’s ligand binding site. Positive training examples were obtained by mapping bioactive drug–target interaction data onto available 3D protein structures sourced from the Protein Data Bank24 or SwissModel.18 Specifically, DTI binding sites were inferred based on chemical similarity to known cocrystal ligands or the superimposed cocrystal ligands from homologous proteins. Each positive drug–target pair was shuffled 19 times to generate negative training examples.

DTI Evaluations

Novel DCAF1 binders were discovered using a DTI screening workflow driven by Cyclica’s 2020Q2 release MatchMaker model. MatchMaker evaluated all interactions between 2,878,172 molecules from the Enamine Screening Collection and 29,293 total protein pockets (84.3 billion total inferences) comprising the human proteome including the three predicted DCAF1 pockets. A proteome binding profile was generated for each Enamine molecule by sorting all evaluated pockets according to their respective MatchMaker scores and selecting the top-ranked pocket to represent each protein’s score. DCAF1’s rank within the proteome binding profile was used as a MatchMaker signal specificity metric (see Candidate Selection). Inference was performed using 13 instances of 16 CPU, 60 GB virtual machines over 24 h using first-generation Intel Skylake or Intel Xeon E5-series processors (approximately 4700 evaluations per CPU-second).

Candidate Selection

The first selection criterion was primary target engagement based on MatchMaker binding probabilities. The 10,000 top scoring compounds for each of the three DCAF1 pockets were pooled, yielding a set of 23,261 unique molecules. Pooled molecules were subsequently filtered on the basis of MatchMaker proteome binding profiles to avoid compounds with nonspecific DCAF1 predictive binding signals. Compounds whose DCAF1 proteome binding ranks were larger than 101 for the inside pocket (Figure 2D) were excluded (4321 molecules remained). Subsequently, compounds were manually assessed and removed with consideration given to poor physicochemical properties, specifically number of rotatable bonds ≤ 12, number of H-bond acceptors ≤ 10, and molecular weight ≥ 550 Da. Since MatchMaker performs pose-independent DTI predictions, the remaining 2225 compounds were subjected to a final molecular docking ranking step for the inside and top pockets (see Figure 2) using default parameters with ICM-Pro (v. 3.8-2c, MolSoft, CA, USA). Finally, 101 compounds were selected for in vitro testing, using a combination of favorable docking scores and inspection of docked poses.

Biophysical Characterization of Computational Hits

Protein Expression and Purification

The selected compounds were tested against the WDR domains of human DCAF1 and human WDR5 (another WDR protein) as a negative control. For DCAF1 expression, a DNA fragment encoding human DCAF1 residues 1038–1400 was amplified by PCR and subcloned into an in-house insect cell expression vector pFBD-BirA (a derivative of the pFastBac Dual vector from Invitrogen) carrying an N-terminal AviTag, a C-terminal His6-tag, and coexpression of BirA. The resulting plasmid was transformed into DH10BacTM competent Escherichia coli (E. coli) cells (Invitrogen), and the recombinant viral DNA bacmid was purified and used in the recombinant expression of biotinylated DCAF1 protein in a baculovirus-Sf9 expression system as described by Hutchinson and Seitova25 using biotin-supplemented media. For WDR5 expression, DNA encoding residues 2–334 of the human WDR5 were cloned into an in-house E. coli expression vector pNIC-Bio2 having an N-terminal His10-tag followed by a TEV cleavage site and a C-terminal AviTag. Biotinylated WDR5 protein was expressed in E. coli using biotin-supplemented media. Both proteins were purified using a similar purification protocol. Briefly, cells were harvested and lysed, and the proteins were purified through a pre-equilibrated HisPur Ni-NTA resin (Thermo Scientific) column followed by gel filtration on an ÄKTA PURE system (GE Healthcare), using a Superdex200 26/600 column (GE Healthcare) pre-equilibrated with 20 mM Tris–HCl pH 8.0, 150 mM NaCl, 5% glycerol, and 2 mM 2-mercaptoethanol for DCAF1 and 50 mM Tris–HCl pH 8.0, 150 mM NaCl, and 0.5 mM TCEP for WDR5. The yield of the purified biotinylated proteins was 7 mg/L for DCAF1 and 9.8 mg/L for WDR5.

Surface Plasmon Resonance (SPR) Binding Studies

SPR studies were performed using a Biacore T200 (GE Health Sciences, Inc.) at 20 °C. Biotinylated DCAF1 (1038–1400 aa) and WDR5 control (2–334 aa) proteins were each captured onto one flow cell of a streptavidin-conjugated SA chip (according to the manufacturer’s protocol) achieving 7000 response units (RU), while another flow cell was left empty for reference subtraction. Compounds were dissolved in 100% DMSO (20 mM stock) and diluted to working concentrations in 100% DMSO before 3-fold serial dilutions prepared in buffer with DMSO yielding five concentrations. For SPR analysis, compounds were diluted in HBS-EP+ (10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, and 0.05% Tween-20) giving a final concentration of 1% DMSO. Kinetic determination experiments were performed using single-cycle kinetics with an association time of 60 s, a dissociation time of 120 s, and a flow rate of 75 μL/min. Kinetic curve fittings and KD calculations were performed using a steady-state affinity model and Biacore T200 Evaluation software (GE Health Sciences, Inc.).

Protein Crystallography

DCAF1 WDR Domain Gene Cloning, Protein Expression, and Purification

The gene of the human DCAF1 WDR domain (UniProtKB Q9Y4B6; residues 1077–1390) having residues 1077 (Phe) and 1079 (Arg) mutated to alanine was subcloned into an in-house insect cell expression vector pFBOH-MHL, yielding an expression construct with an N-terminal His6-tag followed by a TEV cleavage site. DCAF1 protein expression was carried out in a baculovirus-Sf9 expression system as described by Hutchinson and Seitova.25 The collected cells were resuspended in lysis buffer (50 mM Tris pH 7.5, 0.4 M NaCl, 5% glycerol, protease inhibitor cocktail, 0.1% NP40, and benzonase endonuclease) and lysed by sonication. The supernatant (cell-free extract) was collected by centrifugation.

Protein purification was performed by immobilized metal ion affinity chromatography (IMAC). Briefly, the supernatant was incubated with TALON cobalt affinity resin equilibrated with a binding buffer containing 50 mM Tris–HCl pH 7.5, 0.4 M NaCl, and 5% glycerol. After binding, the resin was washed with the binding buffer followed by two consecutive washes with the binding buffer supplemented with 5 and 10 mM imidazole. The protein was then eluted using a buffer containing 50 mM Tris–HCL pH 7.5, 0.4 M NaCl, 5% glycerol, and 250 mM imidazole. The eluted DCAF1 protein was then dialyzed overnight into a buffer containing 50 mM Tris–HCl pH 7.5, 0.4 M NaCl, 5% glycerol, and 10 mM β-mercaptoethanol in the presence of TEV protease to remove the polyhistidine purification tag. The protein sample was then applied to TALON resin, and the unbound (cleaved) protein was collected.

The collected protein was concentrated and loaded onto a HiLoad 26/600 Superdex 200 gel filtration column (on an ÄKTA Pure chromatography system (GE Healthcare)) running in the final protein buffer containing 20 mM Tris–HCl pH 7.5, 150 mM NaCl, and 1 mM TCEP. Protein fractions containing pure DCAF1 protein, as confirmed by SDS-PAGE, were pooled and concentrated using 10 kDa cutoff spin columns (Millipore). The final protein concentration was determined using a NanoDrop UV–vis spectrophotometer (Thermo Scientific), with the DCAF1 protein extinction coefficient of 35,410 M–1 cm–1 as computed from the amino acid sequence using Expasy ProtParam (https://web.expasy.org/protparam/). The purified protein was concentrated to 25 mg/mL, with the total protein yield of 1.6 mg/L.

Protein Crystallization

Purified DCAF1 protein was cocrystallized with CYCA-117-70 using a precipitant solution containing 25% PEG3350, 0.1 M ammonium sulfate, and 0.1 M Bis-Tris pH 5.5. Briefly, the protein in gel filtration buffer (20 mM Tris–HCl pH 7.5, 150 mM NaCl, and 1 mM TCEP) at a 10 mg/mL (0.2812 mM) concentration was mixed with 4.218 mM (15 times molar excess) CYCA-117-70 and incubated at room temperature for 30 min prior to the crystallization setup. Equal volumes of protein–compound complexes and precipitant solution were set up in 1 μL drops over a 90 μL reservoir solution using the sitting-drop vapor-diffusion method. Crystals were observed within 72 h at 18 °C.

Diffraction Data Collection, Structure Determination, and Refinement

Crystals were briefly soaked into a cryo-protectant solution containing the crystallization mother liquor supplemented with 10% ethylene glycol and 1 mM compound before cryo-cooling into liquid nitrogen. Diffraction data were collected on the beamline 24-ID-C at the Advanced Photon Source in Argonne National Laboratory. The data were integrated and indexed using HKL3000.26 The structure was solved by molecular replacement with Phaser27 using the DCAF1 crystal structure (PDB ID 4PXW) as a starting model. Refinement was performed by alternating rounds of manual rebuilding in Coot28 followed by refinement with Refmac29 within the CCP4 crystallography suite.30 The MolProbity server31 was used for model validation before deposition. The structure was analyzed using UCSF Chimera32 and molecular graphic images rendered using PyMOL.33 The compound, CYCA-117-70, was designated B1I, and the model coordinates were deposited in the RCSB PDB, PDB ID 7SSE.

Results

Candidate Selection and Experimental Confirmation of the Hit Compound CYCA-117-70

Candidate compounds were selected for in vitro screening based on the MatchMaker binding probabilities, relative rank of MatchMaker probabilities with respect to the MatchMaker proteome (proteome binding rank), physicochemical properties, and molecular docking. From the virtual screening campaign, 101 commercially available compounds were procured for experimental testing to measure the binding affinities to the DCAF1 WDR domain using surface plasmon resonance (SPR). Among them, we identified an initial hit, CYCA-117-70, that binds to DCAF1 with an estimated KD of 70 μM (Figure 3) and a calculated ligand efficiency (LE)34 of 0.21 (tested at 293 K). CYCA-117-70 showed no significant binding to WDR5 protein, a WDR family protein, indicating that it was selective for the DCAF1 WDR domain. This finding is in agreement with results from the MatchMaker proteome screening for this compound, which had a relative rank of 15 for DCAF1 compared to 927 for WDR5. We found a discrepancy in binding affinities between the original batch and the reordered batch of CYCA-117-70, as the latter was less soluble and appeared less potent than the original batch that was crystallized with DCAF1.

Figure 3.

Figure 3

Binding of CYCA-117-70 to DCAF1 and WDR5 by SPR. (A) Binding of CYCA-117-70 to DCAF1 WDR. (B) Binding to WDR5. SPR binding data (representative plot of N = 2) represented in the steady-state response (black circles) with the steady state 1:1 binding model fitting (red dashed line) and the sensorgram (solid green) with the kinetic fit (black dots). CYCA-117-70 showed binding to DCAF1 with an estimated KD of 70 μM (since the binding curve does not fully reach saturation) and no significant binding to WDR5 (KD not determined).

Crystal Structure of the Human DCAF1 WDR Domain Bound to CYCA-117-70

To characterize the CYCA-117-70 interaction with DCAF1, we determined the 1.62 Å resolution structure of CYCA-117-70 bound to the human DCAF1 WDR domain (residue range of 1077 to 1390), where residues 1077 and 1079 were mutated to alanine to promote crystallization, referred to here as DCAF1-CYCA-117-70 (PDB ID 7SSE). The two mutated residues were identified following several rounds of construct design and crystal optimization. Table 1 summarizes the crystallographic data collection, refinement, and validation statistics.

Table 1. Data Collection and Refinement Statisticsa.

  DCAF1-CYCA-117-70
PDB ID 7SSE
Wavelength (nm) 0.9791
Resolution range (Å) 50–1.62 (1.65–1.62)
Space group P 1 21 1
Unit cell (Å) 48.956, 87.919, 73.878
Total reflections 269,845
Unique reflections 76,489
Multiplicity 3.5 (2.1)
Completeness (%) 97.4 (78.4)
Mean I/sigma (I) 30.92 (1.45)
R-merge 0.041 (0.516)
R-meas 0.048 (0.651)
R-pim 0.024 (0.390)
CC1/2 0.998 (0.666)
CC* 0.999 (0.894)
Reflections used in refinement 72,666
Reflections used for R-free 3750
R-work 0.206
R-free 0.234
CC (work) 0.960
CC (free) 0.947
Number of nonhydrogen atoms  
Macromolecules 4608
Ligands 26
Solvent 224
Protein residues  
RMS (bonds) 0.005
RMS (angles) 1.307
Ramachandran favored (%) 96.88
Ramachandran allowed (%) 96.81
Ramachandran outliers (%) 0.0
Poor rotamer (%) 0.8
Clash score 2.32
Average B-factor  
Macromolecules 29.153
Ligands 35.953
Solvent 33.077
a

Statistics for the highest-resolution shell are shown in parentheses.

The DCAF1-CYCA-117-70 structure contains two copies of the DCAF1 WDR domain in the crystal asymmetric unit. Well-resolved electron density for the entire CYCA-117-70 molecule was observed only in one of the two DCAF1 WDR chains (Figure S2). CYCA-117-70 binds close to the surface of the central channel of the DCAF1 WDR domain ring (Figure 4A), where it is positioned via hydrophobic interactions and a water-mediated hydrogen bond with surrounding residues (Figure 4B). Specifically, the 3-fluorophenyl group is nested in a pocket formed by the side chains of L1313, R1298, R1225, and C1227. The amino piperidine moiety packs against the side chains of H1140, T1181, and P329, while the pyrimidine group is sandwiched between T1139 and P1329 and engages in a water-mediated hydrogen bond with the backbone of R1298 and F1355. The morpholine group occupies a space close to the side chain of T1097 and T1135 (Figure 4B).

Figure 4.

Figure 4

Cocrystal structure of the DCAF1 WDR domain in complex with CYCA-117-70. (A) Top and side views of the DCAF1 WDR domain shown as a cartoon representation in slate blue, bound to CYCA-117-70 shown as yellow sticks. The compound binds close to the surface of the WDR ring central channel. (B) Zoomed-in view of the CYCA-117-70 binding site in chain A of the DCAF1-CYCA-117-70 cocrystal structure. CYCA-117-70 is shown as yellow sticks, water molecules are shown as red spheres, and the putative hydrogen bond is shown as black dashes. (C) Overlay of the DCAF1 monomer (slate blue surface) bound to CYCA-117-70 (yellow sticks) on to lentiviral Vpx (green) (PDB ID 4CC9, data from ref (22), revealing a steric overlap between the two ligands.

Comparison of the DCAF1-CYCA-117-70 cocrystal structure with that of the human DCAF1 in complex with the lentiviral accessory protein Vpx and the Mandrill SAMHD1 (PDB ID 4CC9)22 reveals that the compound binds close to the Vpx binding site and appears to overlap with part of the Vpx helix that binds to the surface of the DCAF1 WDR ring, with the Vpx Lys84 and Phe80 side chains clashing with the compound (Figure 4C).

Discussion

MatchMaker uses the predictive power of deep learning to generalize DTI datasets to low-data or data-less targets. Combining this approach with cheminformatics filtering and assessing docking poses led to the first cocrystal structure of DCAF1 in complex with a small-molecule ligand (PDB 7SSE). The DTI-based hit identification workflow selected 101 ligand candidates from a source library of over 2.8 M molecules, from which one compound was later experimentally validated. This result illustrates the potential of DTI models, which offer a receptor-based machine learning strategy without the explicit need of target-specific training data. While acknowledging that 1 in 101 validated hit rates may be low relative to alternative virtual screening methodologies under ideal target conditions, it is important to note that (1) small-molecule ligands have been reported for only a handful of WDR domains, (2) the central pocket of WDR domains is poorly conserved, and (3) no DCAF1 chemical ligand was known when the virtual screening was conducted. WDR domains in general, and DCAF1 in particular, are therefore absent from training sets for AI applications. As an AI model, however, there is an opportunity to learn from the hits and misses to improve future hit rates. The DTI model used in this study was trained on simulated random negative training examples, but future models may consider training on explicit, experimentally observed negative training examples, potentially improving performance on low-data targets. While a KD of 70 μM is weak, the cocrystal structure provides a specific binding pose at the critical interface between Vpr/x and DCAF1, which could serve as a valuable starting point for developing inhibitors of the protein binding interactions of DCAF1 and for the generation of proximity-induced degraders. Further, CYCA-117-70 is a modular and developable molecule with an LE of 0.21 with no concerning functional groups, which can be optimized with minimal medicinal chemistry efforts. Our results support the idea that the structural chemistry of receptor–drug interactions learned from the PDB can be applied to orphan proteins.

An additional noteworthy feature of CYCA-117-70 is the observed experimental specificity for DCAF1 over WDR5. Since MatchMaker is able to explicitly compute the rank of a molecule for a given target relative to thousands of proteins in the proteome, it inherently considers polypharmacology as a bias during compound selection. Indeed, we find that the experimental selectivity observed by CYCA-117-70 is in line with its predicted rank of 15 for DCAF1 and 927 for WDR5, despite not explicitly counter screening against WDR5. Considering that polypharmacology during the hit-finding stage certainly has its advantages and disadvantages, on the one hand, an approach that successfully identifies a selective molecule upfront has the potential to accelerate drug discovery over the obligate multistep approach that is typical in probe-development/medicinal chemistry. On the other hand, there is the potential to miss viable hits for the target, which can be later augmented for selectivity. Molecular docking can complement the inherent pose-independent limitations of DTI models, offering a means to exclude library molecules on the basis of geometric incompatibility or situationally discriminating between closely related proteins when SBDD circumstances are more favorable. There is a large variety of computational drug discovery tools used in hit identification, lead optimization, and drug design to select from, with no single “magic bullet” application. DTI-based hit identification can be used independently or as a complementary method alongside SBDD, particularly for situations where SBDD may prove more difficult, such as using apo structures and/or predicted (AlphaFold) structures.

Importantly, our cocrystal structure shows that, while CYCA-117-70 occupies the central channel of the DCAF1 WDR domain (Figure 4A), it remains solvent-accessible, and its pyrimidine group could potentially serve as an anchor point for future DCAF1-recruiting PROTACs. However, a first step would be to explore the SAR around this chemical template to improve its binding affinity. Critically, the ligand-induced fit occurs at the protein–protein interaction (PPI) site, making this a great starting point to disrupt PPI with viral accessory proteins.

DCAF1 was recently targeted by electrophylic PROTACs covalently engaging its WDR domain, leading to ubiquitination and degradation of FKBP12 protein.17 We believe that more potent analogs of CYCA-117-70 would be ideal chemical handles for future PROTACs recruiting DCAF1 noncovalently, as a reversible binding mode could translate into different selectivity and toxicity profiles.

Since the deposition of the DCAF1-CYCA-117-70 structure (PDB 7SSE), the first publicly released cocrystal structure of DCAF1 bound to a small molecule, there have been notable advances in the discovery of small molecules for DCAF1,17,35,36 further validating that DCAF1 is not only a novel target but also a highly tractable and promising E3 ligase. Indeed, while this manuscript was under revision, two series of DCAF1 ligands were reported.35,36 One series is derived from a ligand discovered via DNA-encoded chemical library screening37 that binds deeper in the WDR central cavity and therefore does not overlap with viral accessory proteins Vpr/x binding to the DCAF1 WDR domain35 (Figure S3A). A compound from the second series subsequently turned into a DCAF1-recruiting PROTAC36 is less deeply bound in the central WDR cavity, in a binding pose that is very similar to CYCA-117-70 (Figure S3B). This similarity supports the potential for further development of more potent analogs of CYCA-117-70 as antagonists of viral accessory proteins, as well as chemical handles for targeted protein degradation, but also highlights the potential of the DTI approach as a promising tool for ligand discovery in nonligand and low-data protein targets.

Conclusions

Here, we demonstrate that AI-based virtual screening can enable the rapid discovery of chemical ligands for orphan proteins. Combining the speed of MatchMaker with structure-based molecular docking allowed us to rapidly explore millions of compounds and identify the first published cocrystal structure of DCAF1 bound to CYCA-117-70. Our open science public–private collaborative framework allowed disclosure of our hit in the PDB within weeks of its discovery. We believe that adoption of this or similar working models would benefit precompetitive research and accelerate drug discovery.

Acknowledgments

We would like to thank Peter Loppnau, Almagul Seitova, Ashley Hutchinson, Pegah Ghiabi, and Taraneh Hajian for protein expression and purification. This work is based upon research conducted at the Northeastern Collaborative Access Team beamlines, which are funded by the National Institute of General Medical Sciences from the National Institutes of Health (P30 GM124165). This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357. Structural Genomics Consortium is a registered charity (no. 1097737) that receives funds from Bayer AG, Boehringer Ingelheim, Bristol Myers Squibb, Genentech, Genome Canada through Ontario Genomics Institute [OGI-196], EU/EFPIA/OICR/McGill/KTH/Diamond Innovative Medicines Initiative 2 Joint Undertaking [EUbOPEN grant 875510], Janssen, Merck KGaA (aka EMD in Canada and US), Pfizer, and Takeda. M.S. gratefully acknowledges financial support from NSERC [Grant RGPIN-2019-04416].

Glossary

Abbreviations

AI

artificial intelligence

CLR4

cullin-4 RING ubiquitin ligase

DCAF1

DDB1 and CUL4 associated factor 1

DDB1

DNA damage-binding protein 1

DTI

drug–target interaction

EDVP

EDD, DDB1, and VPRBP E3 ligase complex

HECT

homologous to the E6-AP carboxyl terminus

HIV

human immunodeficiency virus

LBDD

ligand-based drug design

RING

really interesting new gene

PROTACS

proteolysis targeting chimeras

SBDD

structure-based drug design

SIV

simian immunodeficiency virus

Vpr/Vpx

accessory proteins r or x

WDR

WD40 repeat

Data Availability Statement

Atomic coordinates and structure factors for the reported crystal structure have been deposited in the Protein Data Bank under the accession code 7SSE. MatchMaker is a commercial software developed by Cyclica, Inc. and now owned by Recursion Pharmaceuticals Inc.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c00082.

  • Figure showing the domain architecture of human DCAF1 and the WDR fold, figure of the electron density of CYCA-117-70 and its binding site, figure depicting the comparison of CYCA-117-70 with other recently reported DCAF1 ligands (PDF)

  • SMILES string of the 101 compounds computationally selected for experimental testing (XLSX)

  • SPR raw data for CYCA-117-70 (XLSX)

Author Contributions

# S.W.K. and J.O. contributed equally.

The authors declare the following competing financial interest(s): JO, DK, CY, SS, MK, and VS are employees of Recursion Pharmaceuticals Inc. and may own stock in Recursion Pharmaceutical Inc. Recursion Pharmaceuticals owns and maintains MatchMaker. All authors declare no other conflicts of interest.

Supplementary Material

ci3c00082_si_001.pdf (431.2KB, pdf)
ci3c00082_si_002.xlsx (390KB, xlsx)
ci3c00082_si_003.xlsx (11.4KB, xlsx)

References

  1. Ehrlich P. The Mutual Relations between Toxin and Antitoxin. Boston Med. Surg. J. 1904, 150, 443–445. 10.1056/NEJM190404281501701. [DOI] [Google Scholar]
  2. Langley J. N. On the Reaction of Cells and of Nerve-Endings to Certain Poisons, Chiefly as Regards the Reaction of Striated Muscle to Nicotine and to Curari. J. Physiol. 1905, 33, 374–413. 10.1113/jphysiol.1905.sp001128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. MacKinnon S. S.; Madani Tonekaboni S. A.; Windemuth A. Proteome-Scale Drug-Target Interaction Predictions: Approaches and Applications. Curr. Protoc. 2021, 1, e302 10.1002/cpz1.302. [DOI] [PubMed] [Google Scholar]
  4. Schapira M.; Tyers M.; Torrent M.; Arrowsmith C. H. WD40 Repeat Domain Proteins: A Novel Target Class?. Nat. Rev. Drug Discov. 2017, 16, 773–786. 10.1038/nrd.2017.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Higa L. A.; Wu M.; Ye T.; Kobayashi R.; Sun H.; Zhang H. CUL4-DDB1 Ubiquitin Ligase Interacts with Multiple WD40-Repeat Proteins and Regulates Histone Methylation. Nat. Cell Biol. 2006, 8, 1277–1283. 10.1038/ncb1490. [DOI] [PubMed] [Google Scholar]
  6. Sharma P.; Nag A. CUL4A Ubiquitin Ligase: A Promising Drug Target for Cancer and Other Human Diseases. Open Biol. 2014, 4, 130217 10.1098/rsob.130217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Maddika S.; Chen J. Protein Kinase DYRK2 Is a Scaffold That Facilitates Assembly of an E3 Ligase. Nat. Cell Biol. 2009, 11, 409–419. 10.1038/ncb1848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Angers S.; Li T.; Yi X.; MacCoss M. J.; Moon R. T.; Zheng N. Molecular Architecture and Assembly of the DDB1-CUL4A Ubiquitin Ligase Machinery. Nature 2006, 443, 590–593. 10.1038/nature05175. [DOI] [PubMed] [Google Scholar]
  9. Fischer E. S.; Scrima A.; Böhm K.; Matsumoto S.; Lingaraju G. M.; Faty M.; Yasuda T.; Cavadini S.; Wakasugi M.; Hanaoka F.; Iwai S.; Gut H.; Sugasawa K.; Thomä N. H. The Molecular Basis of CRL4DDB2/CSA Ubiquitin Ligase Architecture, Targeting, and Activation. Cell 2011, 147, 1024–1039. 10.1016/j.cell.2011.10.035. [DOI] [PubMed] [Google Scholar]
  10. Nakagawa T.; Mondal K.; Swanson P. C. VprBP (DCAF1): A Promiscuous Substrate Recognition Subunit That Incorporates into Both RING-Family CRL4 and HECT-Family EDD/UBR5 E3 Ubiquitin Ligases. BMC Mol. Biol. 2013, 14, 22. 10.1186/1471-2199-14-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Sakamoto K. M.; Kim K. B.; Kumagai A.; Mercurio F.; Crews C. M.; Deshaies R. J. Protacs: Chimeric Molecules That Target Proteins to the Skp1-Cullin-F Box Complex for Ubiquitination and Degradation. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 8554–8559. 10.1073/pnas.141230798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Zhang S.; Feng Y.; Narayan O.; Zhao L. J. Cytoplasmic Retention of HIV-1 Regulatory Protein Vpr by Protein-Protein Interaction with a Novel Human Cytoplasmic Protein VprBP. Gene 2001, 263, 131–140. 10.1016/s0378-1119(00)00583-7. [DOI] [PubMed] [Google Scholar]
  13. Hossain D.; Ferreira Barbosa J. A.; Cohen É. A.; Tsang W. Y. HIV-1 Vpr Hijacks EDD-DYRK2-DDB1DCAF1 to Disrupt Centrosome Homeostasis. J. Biol. Chem. 2018, 293, 9448–9460. 10.1074/jbc.RA117.001444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Schabla N. M.; Mondal K.; Swanson P. C. DCAF1 (VprBP): Emerging Physiological Roles for a Unique Dual-Service E3 Ubiquitin Ligase Substrate Receptor. J. Mol. Cell Biol. 2019, 11, 725–735. 10.1093/jmcb/mjy085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hrecka K.; Gierszewska M.; Srivastava S.; Kozaczkiewicz L.; Swanson S. K.; Florens L.; Washburn M. P.; Skowronski J. Lentiviral Vpr Usurps Cul4-DDB1[VprBP] E3 Ubiquitin Ligase to Modulate Cell Cycle. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 11778–11783. 10.1073/pnas.0702102104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Guo Z.; Kong Q.; Liu C.; Zhang S.; Zou L.; Yan F.; Whitmire J. K.; Xiong Y.; Chen X.; Wan Y. Y. DCAF1 Controls T-Cell Function via P53-Dependent and -Independent Mechanisms. Nat. Commun. 2016, 7, 10307. 10.1038/ncomms10307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Tao Y.; Remillard D.; Vinogradova E. V.; Yokoyama M.; Banchenko S.; Schwefel D.; Melillo B.; Schreiber S. L.; Zhang X.; Cravatt B. F. Targeted Protein Degradation by Electrophilic PROTACs That Stereoselectively and Site-Specifically Engage DCAF1. J. Am. Chem. Soc. 2022, 144, 18688–18699. 10.1021/jacs.2c08964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Waterhouse A.; Bertoni M.; Bienert S.; Studer G.; Tauriello G.; Gumienny R.; Heer F. T.; de Beer T. A. P.; Rempfer C.; Bordoli L.; Lepore R.; Schwede T. SWISS-MODEL: Homology Modelling of Protein Structures and Complexes. Nucleic Acids Res. 2018, 46, W296–W303. 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Sugiyama M. G.; Cui H.; Redka D. S.; Karimzadeh M.; Rujas E.; Maan H.; Hayat S.; Cheung K.; Misra R.; McPhee J. B.; Viirre R. D.; Haller A.; Botelho R. J.; Karshafian R.; Sabatinos S. A.; Fairn G. D.; Madani Tonekaboni S. A.; Windemuth A.; Julien J.-P.; Shahani V.; MacKinnon S. S.; Wang B.; Antonescu C. N. Multiscale Interactome Analysis Coupled with Off-Target Drug Predictions Reveals Drug Repurposing Candidates for Human Coronavirus Disease. Sci. Rep. 2021, 11, 23315. 10.1038/s41598-021-02432-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. He Y.; Yang C.; Wang Y.; Sacher J. R.; Sims M. M.; Pfeffer L. M.; Miller D. D. Novel Structural-Related Analogs of PFI-3 (SRAPs) That Target the BRG1 Catalytic Subunit of the SWI/SNF Complex Increase the Activity of Temozolomide in Glioblastoma Cells. Bioorg. Med. Chem. 2022, 53, 116533 10.1016/j.bmc.2021.116533. [DOI] [PubMed] [Google Scholar]
  21. Krivák R.; Hoksza D. P2Rank: Machine Learning Based Tool for Rapid and Accurate Prediction of Ligand Binding Sites from Protein Structure. Aust. J. Chem. 2018, 10, 39. 10.1186/s13321-018-0285-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Schwefel D.; Groom H. C. T.; Boucherit V. C.; Christodoulou E.; Walker P. A.; Stoye J. P.; Bishop K. N.; Taylor I. A. Structural Basis of Lentiviral Subversion of a Cellular Protein Degradation Pathway. Nature 2014, 505, 234–238. 10.1038/nature12815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Berman H. M.; Westbrook J.; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I. N.; Bourne P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hutchinson A.; Seitova A.. Production of Recombinant PRMT Proteins Using the Baculovirus Expression Vector System. J. Vis. Exp.JoVE: 2021, No. 173. 10.3791/62510. [DOI] [PubMed] [Google Scholar]
  26. Minor W.; Cymborowski M.; Otwinowski Z.; Chruszcz M. HKL-3000: The Integration of Data Reduction and Structure Solution--from Diffraction Images to an Initial Model in Minutes. Acta Crystallogr. D Biol. Crystallogr. 2006, 62, 859–866. 10.1107/S0907444906019949. [DOI] [PubMed] [Google Scholar]
  27. McCoy A. J.; Grosse-Kunstleve R. W.; Adams P. D.; Winn M. D.; Storoni L. C.; Read R. J. Phaser Crystallographic Software. J. Appl. Crystallogr. 2007, 40, 658–674. 10.1107/S0021889807021206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Emsley P.; Cowtan K. Coot: Model-Building Tools for Molecular Graphics. Acta Crystallogr. D Biol. Crystallogr. 2004, 60, 2126–2132. 10.1107/S0907444904019158. [DOI] [PubMed] [Google Scholar]
  29. Murshudov G. N.; Vagin A. A.; Dodson E. J. Refinement of Macromolecular Structures by the Maximum-Likelihood Method. Acta Crystallogr. D Biol. Crystallogr. 1997, 53, 240–255. 10.1107/S0907444996012255. [DOI] [PubMed] [Google Scholar]
  30. Winn M. D.; Ballard C. C.; Cowtan K. D.; Dodson E. J.; Emsley P.; Evans P. R.; Keegan R. M.; Krissinel E. B.; Leslie A. G. W.; McCoy A.; McNicholas S. J.; Murshudov G. N.; Pannu N. S.; Potterton E. A.; Powell H. R.; Read R. J.; Vagin A.; Wilson K. S. Overview of the CCP4 Suite and Current Developments. Acta Crystallogr. D Biol. Crystallogr. 2011, 67, 235–242. 10.1107/S0907444910045749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Chen V. B.; Arendall W. B.; Headd J. J.; Keedy D. A.; Immormino R. M.; Kapral G. J.; Murray L. W.; Richardson J. S.; Richardson D. C. MolProbity: All-Atom Structure Validation for Macromolecular Crystallography. Acta Crystallogr. D Biol. Crystallogr. 2010, 66, 12–21. 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Pettersen E. F.; Goddard T. D.; Huang C. C.; Couch G. S.; Greenblatt D. M.; Meng E. C.; Ferrin T. E. UCSF Chimera--a Visualization System for Exploratory Research and Analysis. J. Comput. Chem. 2004, 25, 1605–1612. 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  33. DeLano W.; Schrödinger L.. PyMOL. Retrieved Httpwwwpymolorgpymol.
  34. Hopkins A. L.; Groom C. R.; Alex A. Ligand Efficiency: A Useful Metric for Lead Selection. Drug Discovery Today 2004, 9, 430–431. 10.1016/S1359-6446(04)03069-7. [DOI] [PubMed] [Google Scholar]
  35. Li A. S. M.; Kimani S.; Wilson B.; Noureldin M.; González-Álvarez H.; Mamai A.; Hoffer L.; Guilinger J. P.; Zhang Y.; von Rechenberg M.; Disch J. S.; Mulhern C. J.; Slakman B. L.; Cuozzo J. W.; Dong A.; Poda G.; Mohammed M.; Saraon P.; Mittal M.; Modh P.; Rathod V.; Patel B.; Ackloo S.; Santhakumar V.; Szewczyk M. M.; Barsyte-Lovejoy D.; Arrowsmith C. H.; Marcellus R.; Guié M.-A.; Keefe A. D.; Brown P. J.; Halabelian L.; Al-Awar R.; Vedadi M. Discovery of Nanomolar DCAF1 Small Molecule Ligands. J. Med. Chem. 2023, 66, 5041–5060. 10.1021/acs.jmedchem.2c02132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Schröder M.; Renatus M.; Liang X.; Meili F.; Zoller T.; Ferrand S.; Gauter F.; Li X.; Sigoillot F.; Gleim S.; Stachyra M.-T.; Thomas J.; Begue D.; Lefeuvre P.; Andraos-Rey R.; Chung B. Y.; Ma R.; Carbonneau S.; Pinch B.; Hofmann A.; Schirle M.; Schmiedberg N.; Imbach P.; Gorses D.; Calkins K.; Bauer-Probst B.; Maschlej M.; Niederst M.; Maher R.; Henault M.; Alford J.; Ahrne E.; Hollingworth G.; Thomä N. H.; Vulpetti A.; Radimerski T.; Holzer P.; Thoma C. R. Reinstating Targeted Protein Degradation with DCAF1 PROTACs in CRBN PROTAC Resistant Settings. bioRxiv 2023, 10.1101/2023.04.09.536153. [DOI] [Google Scholar]
  37. Gironda-Martínez A.; Donckele E. J.; Samain F.; Neri D. DNA-Encoded Chemical Libraries: A Comprehensive Review with Succesful Stories and Future Challenges. ACS Pharmacol. Transl. Sci. 2021, 4, 1265–1279. 10.1021/acsptsci.1c00118. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci3c00082_si_001.pdf (431.2KB, pdf)
ci3c00082_si_002.xlsx (390KB, xlsx)
ci3c00082_si_003.xlsx (11.4KB, xlsx)

Data Availability Statement

Atomic coordinates and structure factors for the reported crystal structure have been deposited in the Protein Data Bank under the accession code 7SSE. MatchMaker is a commercial software developed by Cyclica, Inc. and now owned by Recursion Pharmaceuticals Inc.


Articles from Journal of Chemical Information and Modeling are provided here courtesy of American Chemical Society

RESOURCES