Abstract
Proteolysis targeting chimeras represent a class of drug molecules with a number of attractive properties, most notably a potential to work for targets that, so far, have been in-accessible for conventional small molecule inhibitors. Due to their different mechanism of action, and physico-chemical properties, many of the methods that have been designed and applied for computer aided design of traditional small molecule drugs are not applicable for proteolysis targeting chimeras. Here we review recent developments in this field focusing on three aspects: de-novo linker-design, estimation of absorption for beyond-rule-of-5 compounds, and the generation and ranking of ternary complex structures. In spite of this field still being young, we find that a good number of models and algorithms are available, with the potential to assist the design of such compounds in-silico, and accelerate applied pharmaceutical research.
Keywords: PROTAC, Linker design, ADME, Drug design, Docking refinement, Protein-proteininteractions, Ternary complexes, MD simulation
Graphical Abstract
1. Introduction
Proteolysis targeting chimeras (PROTACs) are a class of compounds that represent an alternative to traditional small molecule drugs (TSMD).[1] By binding simultaneously to a target (usually referred to as protein of interest, POI) and an E3-ligase protein, they trigger the ubiquitination, and transport of the POI to the proteasome, followed by degradation. For a comprehensive discussion of the therapeutic modality we refer to the literature.[2], [3] Suffice is to say that this mode of action comes with a number of advantages compared to TSMDs. Due to the required simultaneous binding to two proteins, PROTACs are generally designed by combining two fragments, one binding to the POI (the warhead), the other binding to the E3-ligase (E3-binder). To comply with steric requirements the two fragments are usually not directly connected by a single covalent bond, but by an additional fragment of varying size (the linker). Once the ternary complex has been formed, then, ideally, this linker not only ensures spatial proximity of the two proteins, but also improves the stability of the resulting ternary complex through favorable interactions with both proteins [4] (see Fig. 1).
Fig. 1.
The ternary complex with its interactions. Each protein (E3 Ligase and POI) has 2 types of interactions: Protein-Protein interaction and Protein-PROTAC interaction.
Computer aided drug design (CADD) offers a range of methods and applications to assist the development of TSMDs. By design, PROTACs differ from such typical drug like compounds, both, in their mode of action, in their size, and in a range of other physico-chemical properties. Therefore, for some established methods used in CADD, their application to PROTACs is not straight forward, and special algorithms and tools need to be developed and validated. Here we review recent developments in this field, focusing on tasks during early stage drug design where CADD tools established for TSMDs need to be adjusted to work for PROTACs. If we assume that the type of E3-ligase used, and its expression levels, are established, and not rate limiting, and the drug is administered orally, these tasks include: 1) generation of small molecule libraries for virtual screening with focus on linker design, 2) estimation of bio-availability, i.e., absorption (solubility and permeability) of comparatively large and flexible compounds, and 3) optimization of the stability of the ternary complex between an E3-ligase, the POI, and the PROTAC molecule. In the following available algorithms and tools will be reviewed, and opportunities and limitations due to the special nature of PROTACs will be discussed. For the in-silico estimation of ternary complex stabilities we also extend the scope to discuss algorithms whose application has, so far, been confined to the study of binary protein complexes, as some of these can be easily extended and applied to the ternary case. We conclude with a discussion of some of the pertinent issues, and finally summarize what has been, and what still needs to be, accomplished towards a comprehensive arsenal of in-silico tools to assist the design of novel PROTAC molecules.
2. Linker generation
For TSMDs, libraries of small organic molecules or drug-like compounds are available, with recent solutions covering millions of compounds or more [5], [6]. Also tools for the de-novo generation of such libraries have been developed [7], [8]. Such libraries can be used as input for a virtual screening campaign when searching hits for a new target. Although at least two digital libraries for PROTAC molecules are available, [9], [10] their use for this purpose is limited. Their size is confined to a few thousand compounds, and due to the choice of warhead, all compounds in these libraries are specific for an established target. Thus, the virtual screening of PROTAC molecules requires several preliminary steps, including selecting E3-ligase, E3-binder, and warhead. Once these components have been established (a topic not discussed here), a PROTAC library can be generated by combining the selected E3-binder and warhead with different linkers, typically using some generative algorithm to produce a large number of diverse compounds. Before discussing available tools for this purpose we stress that a comparison between different linker generation algorithms is difficult since they are generative models. In 2019 Brown et al. published a way of benchmarking models for de-novo molecular design [11]. The so-called distribution-learning benchmarks assess how well models learn to generate molecules that resemble a training set. It consists of 5 metrics: validity, uniqueness, novelty, Frechet ChemNet distance, and Kullback-Leibler (KL) divergence. Unfortunately, not all publications in this field use to same metrics.
DeLinker, a graph-based generative model, was the first molecular generation model that incorporated 3D-structural information directly into the design process by taking relative orientation and distance of the fragments as input.[12] SyntaLinker utilizes a transformer architecture and treats linker generation as a natural language processing task using SMILES notation.[13] Although the both models are capable of generating useful linkers, their support for a simultaneous optimization of physico-chemical properties is limited. A number of models use reinforcement learning to explore and exploit structure space, applying multiple scoring functions. The first method making use of reinforcement learning was PROTAC-RL [14]. Here Zheng et al. present a transformer model called Proformer as their generative method. To improve sampling of linkers with favorable properties, they combine memory-assisted reinforcement learning with different scoring functions. Comparing their results with DeLinker and SyntaLinker, the authors find that the most significant difference between those three methods is the recovery rate. After retraining DeLinker and SyntaLinker on the PROTAC training set, their recovery rates were 4.8% and 10.4%, respectively. In comparison, Proformer achieved the highest recovery rate of 43%. Another model published recently is Link-INVENT [15]. It is based on its predecessor REINVENT, which uses a recurrent neural network Encoder-Decoder architecture to generate molecules [16]. Like PROTAC-RL, it employs diverse flexible scoring functions in its reinforcement learning environment. Comparable to REINVENT, Link-INVENT offers control over the linker itself (e.g., linker length and branching). Unfortunately, due to using different metrics than the above-mentioned algorithms, comparison with other state-of-the-art fragment-linking methods is not straight forward. In a recent publication, Nori et al. [17] use an already-existing architecture of RL-GraphINVENT [18], built upon GraphINVENT [19], a graph-based generative model. They train their surrogate model on a highly sparse data-set of PROTACs with available DC50 values and use it as one of the scoring functions in the reinforcement learning environment. The model is interesting, but due to the small number of available data points for training, and no inclusion of protein information, it is not clear how this DC50 model can be generalizable, as noted in their paper. Furthermore, the authors claim they generate active degraders for IRAK3 POI but verify that by predicting it with a surrogate model, which they used as a scoring function. As mentioned in the paper, experimental assays would be a more proper mitigation of the bias introduced by the surrogate model and the RL framework to generate active degraders for IRAK3 POI.
3. Absorption
The two major physico-chemical properties that determine absorption and bio-availability of an orally administered drug are its solubility and its membrane permeability. PROTACs are usually larger than typical small molecule drugs, and have physico-chemical properties and molecular descriptors that leave them well outside the rule-of-five (Ro5) space [20]. In recent years increasing numbers of beyond Ro5 (bRo5) compounds have been developed as drugs, and the general applicability of the original Ro5 has been called into question [21], [22], [23]. Also, alternative formulation strategies can be used to improve the bio-availability, in particular for compounds with insufficient solubility [24].
Two recent articles discuss the development of PROTACs and similar drugs, discussing physico-chemical properties that are relevant in this field [25], [26]. Both articles approach questions from an applied perspective, and are therefore useful to understand what CADD methods need to deliver, in order to be useful in practice. Pike et al. report multiple issues they encountered when investigating degrader molecules using routine assays [26]. They find that “Relatively minor structural modifications can also have a significant negative impact on oral bioavailability […] suggesting that solubility and/or permeability are highly sensitive to factors other than lipophilicity, such as shape, flexibility/ rigidity, pKa, and so on.” In a similar study Cantrill et al. systematically review various analytical methods used to determine ADME related properties, and also provide results obtained in-house at Roche [25]. They stress that degraders, compared to TSMDs, can pose particular challenges on these methods, finding that here established high-throughput methods frequently need to be replaced by slower/more expensive analytical techniques, to obtain reproducible and reliable results. They add that “well-validated in silico approaches could be used as well in lieu of experimental approaches”. However, regarding solubility prediction they state that “because degraders and other bRo5 molecules are new chemical entities, there is a high likelihood that their structures are not covered by existing computational tools”. In an internal study they found that, when trying to predict solubilities for a set of degraders using ADMET Predictor 9.5 (Simulations Plus Inc.), “in all cases, the software gave alerts that the structures were not covered by the data set used to set up the model”. Taken together these accounts confirm that in-silico based methods for the prediction of ADME related properties might be a valuable complementary approach to assist PROTAC design, provided one can demonstrate that for PROTAC like molecules they deliver results with an accuracy comparable to those we see for TSMDs. Another study confirming that this is not necessarily the case was published by Jimenez et al., who used a number of existing qnatitative structure-property-relationship (QSPR) models to predict solubilities of PROTAC molecules,concluding that “fast and cheap tools routinely used in drug discovery are not able to predict PROTAC solubility” [27].
Traditionally 2D descriptors, such as computed logD (clogD), and topological polar surface area (TPSA) have successfully been used as descriptors in models for TSMD’s solubilities. For bRo5 compounds, including PROTACs, flexibility and conformational preferences, i.e., the 3D structure in solution, are expected to be more important. Sebastiano et al. investigated the solubility of a structurally diverse set of 11 bRo5 drugs at pH7.4 [28]. Their best model was based on the conformation of each drug that had the maximum molecular 3D polar surface area, yielding a correlation of r2 = 0.83 with experimental solubilities. Obviously the small size of the compound set used here calls for caution, and more studies will be required to confirm this finding.
Bergstrom et al. performed an analysis to study properties that govern the solubility of different classes of organic molecules. They state that for so-called “grease balls”, relatively large, flexible and lipophilic molecules, solubility is generally solvation-limited [24], [29]. Obviously most PROTACs belong to this class, and therefore their relative solubilities should depend on their structure and interactions in solution (rather than the solid state). As a consequence physics based screening of solubility, focusing on relative free energies of solvation, might be an attractive alternative to more empirical QSPR based approaches. To our knowledge, no systematic study for PROTAC molecules has been published, but results from studies for TSMDs should be readily transferable. For example Scheen et al. calculated solvation free energies for a subset of the FreeSolv database, using free energy perturbation (FEP), combined with ML based corrections, and conclude that “When compared with ML, the FEP/ML approach outperforms FEP with a much smaller training set size” [30].
The major molecular descriptors used to estimate membrane permeabilities are the size/molecular weight (MW) of a compound, and its lipophilicity. Some studies suggest that a MW above approximately 1000 Da will result in compounds with a passive permeability approaching zero [31], [32]. Interestingly about one third of PROTAC compounds in the literature and drug development pipelines have a MW between 1000 and 1700 Da [10]. A possible explanation for this is the concept of molecular chameleonicity, describing compounds whose solvent exposed polar surface areas vary dynamically depending on the environment. The concept, and its ramifications for drug design, are extensively discussed in a series of papers published by the group of Jan Kihlberg, at Uppsala university. In the first paper the authors approximate 3D polar surface areas, (3D-PSA) and their minimum and maximum values, for 24 bRo5 compounds, based on available crystal structures, and find a correlation, (r2) between a simple model based on the minimum 3D-PSA and permeability, ranging from 0.5 to 0.9 depending on the partial charge cut-off value used to identify polar atoms [28]. Compared to the result obtained when using the 2D descriptor (TPSA), r2 = 0.36, this is a clear improvement. However, the small test set used here, and dependence on the availability of different x-ray structures for each compound, limits the applicability of this model. In a follow up study the authors used a larger set (47 training, 23 test) of macrocycles, and generated confomers using a dedicated software (Omega, Openeye). A large number of descriptors based on these conformers, were then generated and, both a multiple linear regression, and a random forest (RF) model were trained to categorize permeabilities. Interestingly they find that “models based on 2D descriptors, which are fast to calculate, outperformed those based on more time-consuming sampling of 3D conformations”. They explain the inability of 3D descriptors to provide a general model for permeability by limitations of the force field based estimation of conformer energies, and support this conjecture using experimental NMR data [33]. In another study the same authors apply, both, experimental NMR spectra, and Molecular Dynamics (MD) simulations to explain trends in permeability for three different CRBN based PROTACs, based on conformational preferences of the compounds in solution. They find that MD simulations in explicit chloroform can be used for the prospective, qualitative ranking of cell permeability in the design of PROTACs.[34] This result is promising, as the method might be turned into a model for the estimation of relative permeabilities of PROTACs in silico and from first principles. However, the study was confined to three different compounds only, and the MD simulations were probably too short (100 ns) for an exhaustive sampling of conformational space. Therefore more work is required to confirm this finding. In an attempt to generate a simpler model for PROTACs, that is fast enough to be used in high throughput virtual screening Poongavanam et al. tested various ML approaches, using simple 2D descriptors, for a binary classification of permeability. They identify two methods which allow them to correctly classify about 80% of VHL based PROTACs in a test set. However, their results for CRBN based PROTACs are much poorer, which they explain by the unbalanced nature of the available training data [35].
Similar to solubility, permeability can also be determined using physics based methods. Kamenik et al. considered a series of six different macro-cyclic molecules, for which they performed MD simulations in water and chloroform to calculate free energies of solvation in these solvents. The transfer free energies for the six compounds show an excellent correlation with experimental permeabilities (Pearsonr = 0.92) [36]. Thai et al. calculated membrane permeability for a series of 26 compounds from results of steered MD simulations pulling the compounds through a model membrane [37]. Both these physics based accounts, especially the work by Thai et al., are applied to molecules that are small compared to PROTAC molecules. It remains to be shown whether the additional sampling (simulation time) required for getting converged free energies for molecules as large and flexible as PROTACs can be accomplished with reasonable computational resources.
4. Ternary complex structures and stabilities
4.1. Static structures and scoring functions
In 2019 the first attempt to predict PROTAC based ternary complexes (TC) in-silico was published by Drummond and Williams [38]. They designed and tested a number of different work-flows involving docking and scoring, and implemented these in the commercial modeling software MOE [39]. In the Discussion the authors state: “even in the most successful cases, where the methods of this paper produced filtered databases of ternary complexes with ∼ 40% crystal-like poses, it is still difficult to a-priori identify which 40% of the filtered output is crystal-like”, concluding that “if the ultimate goal is to use modeling techniques […] as a surrogate for solving X-ray structures of ternary complexes, then this goal will likely be unmet for the foreseeable future”. In a follow up paper the same authors modify their algorithm, and report somewhat improved results, but still the identification of a single TC structure that is sufficiently close to the native structure remains elusive.
PRosettaC, a variation of the ROSETTA protein modeling tool, specifically designed to model PROTAC based TCs, was used to predict TC structures with a protocol, including docking, scoring, and clustering [40]. After establishing appropriate hyper-parameters for their model using six experimental TC structures, the authors tested their model predicting TCs for six additional PROTAC based TCs with available co-crystal structures. For one of the six complexes the native structure did not resemble any of the predicted clusters, for three others a near-native structure was beneath the predictions, but the corresponding clusters were not ranked anywhere close to the top. For one complex a cluster representing near-native structures was ranked third, and for another one first. Although, with only six cases, the statistics is rather poor, one can conclude that in the majority of cases this algorithm can not predict the native structure with any confidence.
A common limitation of the models discussed so far is that generally they used so-called “bound structures” as input for docking. That is, the two protein monomer structures used for docking were taken from the experimental ternary complex structures, thereby supplying additional information (conformational details of the residues next to the protein-protein interface) that, in the expected use case, is not available. Protein complex formation can cause anything from subtle to rather drastic changes in relative orientations and conformations of surface bound residues of the involved proteins, compared to the structure of the single proteins in solution, or in crystal structures not involving the particular TC in question [41]. The first account working with unbound protein structures as input was published in 2021 by Weng et al. [42] In a relatively complex workflow, involving docking, scoring, filtering, re-scoring, clustering, and refinement steps, the authors manage to produce models with medium or high quality (CAPRI criteria) for 12 out of 14 experimental TC complexes. A cluster containing at least one predicted model close to the experimental structure was ranked on top in five out of 14 cases, but in only one of these cases the structure was “high quality” according to CAPRI criteria. The authors do not clarify how many structures each cluster contained, and whether the remaining structures were also reasonably close to native or not. In a recent pre-print Rao et al. suggest a workflow that is similar to Weng’s, the main difference being the docking stage, for which they propose an algorithm based on Bayesian Optimization to accelerate this stage [43], and an additional refinement stage that involves short simulated annealing MD simulations, combined with molecular mechanics generalized Born surface area (MMGBSA) based scoring. Compared to Weng et al. [42] results improved somewhat, but a consistent top-ranking of near-native models was still not possible.
In the context of drug design, a prediction of a TC structure can be useful in two scenarios: First, the results can be used as basis for structure based drug design (SBDD) in the hit-to-lead phase, where a medicinal chemist might use 3D representations of the structure, and in particular the binding pose of the PROTAC molecule, to suggest modifications that can be expected to improve binding, and thereby the stability of the TC. Secondly, the structures can be used as input for more extensive calculations in an attempt to approximate the relative binding free energies of the complexes in-silico (see below), and to rank different PROTAC molecules for a given taget (virtual screening).
In most cases the approaches discussed above can only provide a list including dozens of, perhaps widely differing, binding poses, for a given TC. In traditional SBDD, involving small molecule inhibitors, and proteins with well defined concave binding sites, an accuracy threshold for the ligand structure of around 2Å is commonly used to identify binding poses that can be expected to provide useful insights. If we apply this threshold to the PROTAC molecule in ternary complexes, it seems that available methods can not yet predict TC structures with the accuracy and confidence required for applied research. In fact, even for the arguably simpler task of modeling binary protein-protein complexes, an accurate prediction of binding poses is still not achievable with the type of approaches discussed above.
In 2018 Han et al. published a benchmark data set of binary protein-protein complexes with known (experimentally determined) structures and binding affinities. They used this set to test four different scoring functions for estimating protein-protein interactions (PPI) [44]. In the best case, using the ATTRACT scoring function, the authors found a success rate of 0.78 in identifying the correct binding pose out of a set of decoys. For binding poses with at least medium accuracy (by CAPRI standards) the ZRANK scoring function performed best, with a success rate of 0.84. These numbers are rather impressive, but one has to keep in mind that here docking was performed with bound structures, and the experimental structure was included in the decoy sets, representing a best case scenario. If the bound structures are not available, then in many, perhaps in the majority of cases, predicted binary complex structures are most likely not accurate enough to meet the required threshold mentioned above. For ternary complexes, and the case of PROTAC based TCs, this confirms the conclusion drawn above – even more so, as 1) in this exercise the correct binding pose was provided as input, presuming that this structure is found in the sampling stage, which for the more difficult case of TCs (compared to binary complexes) is not guaranteed, 2) generally ranking binding poses for TCs (compared to binary complexes) must be expected to be more difficult, and 3) the benchmark set contains primarily cases of obligatory PP complexes, which must be expected to feature stronger interactions, compared to those found in PPIs that are created in a PROTAC-mediated TC. Although, since the study of Hao et al. was published, a number of new PPI scoring functions, including some based on machine learning (ML) have been proposed [45], to our knowledge, and qualitatively, this performance has not been improved. In summary, we find that existing fast docking/scoring algorithms, including refinement of TCs by short MD simulations, can not be expected to provide results with sufficient accuracy and confidence to be useful in applied SBDD. This is true for binary protein-protein complexes, and for TCs involving PROTAC molecules the situation is most likely even worse. Alternative approaches, that account for structural dynamics and provide more accurate estimations of binding free energies, might be required to meet this goal.
4.2. Refinement and enhanced sampling
Feenstra et al. performed coarse grained (CG) MD simulations of binary protein complexes including 15 targets from CAPRI set, to rank binding poses. The authors state that their approach “shows a comparable performance when compared with other state-of-the-art docking scoring functions.” Since the parameterization of CG model potentials can be rather cumbersome, especially if non-peptide molecules (such as PROTACs) are involved this approach must be considered sub-optimal [46]. A study with a refinement protocol similar to the one employed by Rao et al. [43] was employed by Zacharias [47]. For two of the three binary complexes he studied, refinement of initial structures from docking with a short (200 picoseconds) MD simulation in implicit solvent (MMGBSA), and restraints on the back bone atoms, resulted in poses with lower RMSD, when compared to the X-ray structure. For one case, the complex with the weakest experimental binding energy, the RMSD values were in fact increased in most cases. As such, and due to the small number of test cases, the result of this study is ambiguous. If anything it suggests that for PROTAC based complexes, where the protein-protein interaction must be expected to be comparatively weak, this is probably not a good refinement method. Radom et al. use a protocol involving short conventional MD simulations to refine docking results for two different protein complexes. The major difference to the previous account being, that here the authors used explicit solvent MD simulations, combined with simulation temperatures raised step-wise from 303 up to 390 K throughout a total simulation time of approximately 100 nanoseconds [48]. For both test systems, complexes that are close to the experimental structure remain close to their initial structure, while decoys do not, and in some cases even convert to a more native like structure, as the temperature is raised. The results look compelling, but again, the small test set (two cases) does not allow for a comprehensive assessment of this method, especially since in one of the two cases the bound structures from the experimental complex structure were used as input structures for docking. Another account that uses a similar approach was published by Shinobu et al. [49]. For two binary protein complexes, decoys are refined using conventional MD simulations, and relative binding free energies are approximated using a method that, to our knowledge, has not been widely tested. Similar to the two previous accounts the authors find that, over a simulation time of 100 nanoseconds, near-native structures remain close to their initial conformations, while decoys do not.
A more comprehensive account, with a substantial number of test cases was published by Zacharias and co-workers in a series of three papers 2019–2020. For 20 protein complexes with experimental structures and, in most cases, binding energies, they generated binary complex structures using the ATTRACT docking software. Starting with the resulting complexes, binding free energies were calculated 1) after energy minimization, (EM) 2) after a short (30 ps) MD simulation, and 3) from extended MD simulations, using Umbrella Sampling (US). In all cases (including US) a relatively complex set of restraints was applied to keep the docked complex and the secondary/tertiary structure of the individual proteins close to their initial geometries. In a final part of the protocol the US results were corrected by an explicit determination of the contributions of the constraints to the absolute binding free energies. In all simulations the solvent was modeled as implicit solvent. US was performed in 20–55 windows, each with a total simulation time of only 1 nanosecond per window. Considering the complex phase space associated with a pair of interacting proteins at atomic resolution, with rugged surfaces and intertwined side chains, this simulation time appears to be exceedingly short. Apparently the comprehensive set of restraints that were applied allow for converged results nevertheless. The authors not only found that in the majority of cases a correct/near-native binding pose was identified as the one with the lowest binding free energy, but also report a reasonably good correlation between the calculated and experimental binding energies [50]. In the next study the same authors employed a method called repulsive scaling replica exchange MD (RS-REMD), a variation of Hamiltonian REMD to accelerate sampling of protein-protein complexes. In five out of six cases this method identifies a near-native complex structure, remarkably with an initial structure (a decoy) that has the ligand bound on the opposite side of the actual binding site. However, these simulations required several microseconds of total simulation time for convergence. Even with the implicit solvent model used this translates to a real time in excess of one week on a single GPU for a single complex. The procedure could be accelerated considerably when starting from docking poses closer to the native structure, and in 16 out of 20 cases a near-native binding mode could be established [51]. In another study the authors use the same method with explicit solvent and a larger set of 36 PP complexes, and also calculate relative binding free energies, analyzing the REMD simulations with MBAR. Although, due to the restraints used and non-exhaustive sampling these are not true free energies but should rather be considered physics based scoring functions, with r = 0.77 the resulting Pearson correlation between calculated and experimental binding affinities is rather good [52]. Overall such results for a given protein complex, starting from 50 decoys, can be generated in about 1–2 weeks. on a single workstation with a GPU.
An approach conceptually similar to the RS-REMD mentioned above is scaled MD (SMD) as suggested by Scafuri et al. [53] Comparable to approaches using elevated temperatures to accelerate dissociation of non-native complexes, here this is achieved by scaling the total potential in short MD simulations by a factor λ < 0. The authors also define a descriptor, dividing the buried hydrophobic surface area by the RMSD (with respect to the initial structure) of the inter-facial residues. Using this descriptor to re-rank 20 diverse binding poses for eight different complexes, they manage to rank a near-native structure first (out of 20) in five out of eight cases, while the original (HADDOCK) docking score does so in only one case [54].
In an attempt to improve not only scoring, but also sampling, Wang et al. use ClusPro PeptiDock, followed by Gaussian accelerated MD (GaMD) simulations to generate models for a number of protein-peptide complexes. Altogether only three cases were considered, for which the RMSD values of the peptide backbone atoms compared the crystal structures of the complexes decrease from 3.3 to 4.8Å after docking, to 0.6–2.7Å after refinement via GaMD [55].
Perthold and Oostenbrink performed a comparatively comprehensive study using fast pulling MD simulations (Jarzinsky Equality, non-equilibrium MD) to rank a large set of poses from docking for each of 18 PP complexes with available experimental structures. Considering the computational effort required for these calculations – several days on a single GPU for set containing about 1000 decoys – the improvement compared to simple/cheaper scoring algorithms appears to be moderate [56].
Liao at al performed a study, using MD simulation based modeling to predict a small number of ternary complex structures including PROTACs [57]. The authors established a pipeline involving docking/scoring (using ClusPro [58]), followed by short MD simulations, clustering, and MMGBSA based pre-scoring. The final scores were generated starting at a small set of candidate poses, by explicit solvent MD and measuring the average pose occupancy fraction gradually raising the simulation temperature. For all of the four test cases they consider, the best scoring pose turns out being a near-native one. This result looks rather promising, but one has to keep in mind that, not only was the number of test cases rather small, but also is the overall pipeline involved, and characterized by some seemingly ad-hoc choices. For example, the linkers were inserted manually into docked binary complexes, details of the protocol vary somewhat for different structures, and only a small number of poses including the near-native were included into the final step for each complex. Thus, the method is interesting, but certainly requires more extensive testing.
Finally we discuss a reference that includes next to molecular simulation machine learning (ML) algorithms for pose refinement. Wang et al. proposed a graph neural network model trained to discriminate between decoys and the native complex structure, based on features of the PP-interface [59]. They evaluate their model using three different benchmark sets, and compare it to other scoring functions that were designed for the same purpose. For two test sets, that were also used for training, the algorithm ranks an at least acceptable (by CAPRI standards) pose first in about 70% of the cases. For the third (CAPRI) test set this number is not provided but results appear to be comparable. One issue we see in this paper is that the authors do not make it clear in how far the three data sets overlap (data leakage), and how training and test sets were selected. Also they mostly discuss the ability of their algorithm to identify binding poses with acceptable quality, a result that is probably of limited usefulness for SBDD, as discussed in Section 1. However, although no numbers are given, we expect that the speed of this algorithm will make it a good candidate for a filtering step to reduce the number of poses from docking to a number that is manageable by other algorithms for pose refinement.
The final reference mentioned here, a paper by Jandova et al., is, in our opinion, the most lucid account on this topic [60]. It is written in way that helps understanding important features of PPI dynamics, and provide a comprehensive discussion of their results, that allows to clearly understand the merits and limitations of their method, while many of the other references included above, are written in way that obfuscates rather than clarifies. The authors collected a set of 25 PP complexes, split into training (20) and test sets (5). For the 20 training set complexes they performed short (100 nanoseconds) MD simulations of the experimental structure, and four structures from clusters that were obtained with PP docking, where two of the four are near-native solutions, and the other two are non-native. They find, that none of the simulations remain at, or converge towards the native structure during the 100 ns simulation, which, to some extent, might be a consequence of the short simulation time. However, the results do allow for a discrimination between near-native initial structures and non-native decoys, by considering the variation of the structures during the course of the short MD simulations. In line with a number of other authors mentioned above, they find that, generally, a near-native structure remains closer to the initial structure during the course of the simulation. They also calculate a number of descriptors for sub-sets of the simulations, and feed the numbers into a random forest model, which they train to discriminate native from non-native models (initial structures). For the two different test sets they use they find a success rate of 60% and 75% for a correct identification. Given the relatively small number of test cases and models, the statistical significance of this result is moderate only, but it certainly points in a promising direction.
4.3. Ranking
4.3.1. Physics based approaches
Most of the approaches discussed in the previous section use relatively short MD simulations to generate descriptors that can be used to refine and rank different binding poses of a given complex. In recent years a number of more rigorous approaches, involving extensive MD simulations combined with enhanced sampling to calculate to full free energy surfaces of protein-protein binding in solution, have been proposed. A few papers that discuss, both, refinement of poses and scoring of PPIs have already been discussed in the previous paragraph [50], [51], [52], [53], [61] and are therefore not included here. Below we only briefly discuss some other references, focusing on binding free energy prediction. It is important to understand, that, in spite of the use of accelerated or expanded ensemble MD, and comparable methods, in most cases the required computational effort is rather extensive. Thus, using these methods in an applied research setting to rank the stabilities of different PROTAC based complexes might remain impractical for some time to come.
In an early study, Cuendet et al. studied the binding free energy of two protein-protein complexes that differ only by a single mutation [62]. Dissociation potentials of mean force were calculated from numerous non-equilibrium MD simulations, using the Jarzinsky identity. They find that their calculated dissociation free energies largely underestimate experimental values, but do reproduce the experimental trend for the two systems. Gumbart et al. calculated binding free energy for the Barnase-Barnstar complex, using NAMD and US-REMD combined with constraints, and various corrections for these constraints. With a total simulation time of about 2μs their results show excellent agreement with experiment ( − 21.0 vs − 19.0 kcal/mol) [63]. Rodriguez et al. calculated the binding free energy of two proteins, as the potential of mean force, (PMF) at full atomic resolution, and in explicit solvent, based on a steered MD simulation [64]. The authors report a calculated binding free energy of 9.2 kcal/mol, which, qualitatively, agrees with experiment (8.4 kcal/mol). The simulation was performed with NAMD, and required about 72 wall clock hours, running on a cluster of 40 processors, each with 10 cores. Pan et al. considered long MD simulation, with an enhanced sampling algorithm, to study repeated association/dissociation events for five different protein complexes in explicit solvent [65]. Interestingly, they find that “For the five reversibly associating systems, the most stable complex in the simulation agrees with the complex determined crystallographically within atomic resolution”, suggesting that, at least in these cases, the employed classical force field is accurate enough to identify the native complex. The authors compared calculated binding free energies for only one of the five complexes to experimental numbers. Thus, the ability of this method to estimate relative stabilities is unclear. Using a combination of Hamiltonian replica exchange MD (HREMD), and constraints Perthold et al. calculated the absolute binding free energies for two different PP complexes. Agreement between experimental and calculated energies was good. The overall simulation time was in the order of 10μs for one complex [66].
An approach for calculating not only binding free energies but also binding kinetics (free energy barriers) is the multi-ensemble Markov model (MEMM) framework, in combination with accelerated MD (aMD) [67]. The total simulation time for a proof-of-concept study for a single protein-peptide complex amounted to 6μs, consuming about 40000 GPU hours. Similar results were obtained by Plattner et al. considering the Barnstar-Barnase system, using a Markov state model (MSM) and extensive MD simulations (tens of μs) they establish, binding energetics and kinetics, as well as details of the binding process [68].
In a very interesting paper Jost Lopez at al. perform a rigorous derivation for a calculation of KDs from simulation that include terms related to standard state dependence and the second osmotic virial coefficient (B22long range interactions) that normally are not considered in this context [69]. Interestingly they find that these terms are of particular importance in cases of weakly interaction proteins – a situation that we are probably facing in the context of most PROTAC based ternary complexes.
Finally, in a very recent paper, Wang et al. used GaMD to predict Protein binding, including kinetics [70]. The results are reasonably accurate, but, in spite of GaMD probably being one of the most efficient methods for enhanced sampling, simulations for a single PP complex still require at least several microseconds of simulation time to converge. Nevertheless this might be an interesting alternative, to assist SAR generation at the lead optimization stage.
4.3.2. Approaches based on machine learning
Recently, Li et al. introduced DeepPROTACs, a deep neural network model that can efficiently predict the degradation efficacy of a given PROTAC based on the structures of POI and E3 ligase [71]. For that purpose, they combined a public database, PROTAC-DB, with other public sources resulting in 2832 degrader data points and reported achieving 0.847 area under the receiver operating characteristic curve (AUROC) on test sets. Even though this is a promising result, there are some problems that the authors themselves discuss: For one of the models with the best score (AUROC) they only include parts of the proteins closer than 5Å to warhead or E3-binder. It has been proposed that the distance between solvent-exposed lysines on the protein of interest and the E3-ligase complex might be a significant factor for degradation efficacy [72], [73]. Using a threshold of 5Å might be sufficient in some cases, but is not clear whether the performance would hold when generalizing for the whole proteome (i.e., the 660 known ligases). An additional limitation of the model might be related to train and test split, since this split is performed randomly. The split will not ensure that there is no overlap in the sets of protein-ligase pairs in the training and test sets. Therefore, in some cases the model might be trained with the same protein-ligase pair, for which it then predicted the outcome. With that, the generalizability will be compromised. Finally, in the ablation study the authors perform and depict in Fig 4, they claim that removing linker, E3-binder, or warhead information can still lead to AUROC values around 0.8. It is generally acknowledged that these elements of the PROTAC molecule are essential, and it is unclear how a model can still be predictive without this information. For example, if one ignores the linker information, how could this model predict relative potenticies for a series of PROTACs that only differ in their linkers, which arguably is the most common scenario in early stage drug design for PROTACs. In spite of these points, the work still sets a good benchmark for computational investigation of degradation efficacy with ML based models.
Zheng et al. specified a pipeline for ranking of PROTACs (similar to Fig. 2).[14] The entire pipeline took a total of 49 days. They start with the linker generation, for which they use the PROTAC-RL algorithm. Then filter already obtained PROTACs using different filters and clustering techniques. They use the PRosettaC protocol for the remaining PROTAC molecules and select the poses with the best Rosetta score for further MD simulations.[40] Ultimately, they pick the best PROTACs based on their Rosetta score, MM-GBSA score, and synthetic accessibility. For the case study of BRD4 POI, they experimentally tested six PROTACs (out of 5000 generated), out of which cell-based assays and western blot analysis validated three, and one lead candidate was further tested and even demonstrated favorable pharmacokinetics in mice. One of the problems they do not address is the use of bound structures when performing the PRosettaC protocol, even though Zaidman et al., in their paper, specify that PRossettaC fails to rank near-native clusters starting from unbound structures. Moreover for the BRD4 case study, they take the E3-ligase and POI bound structures from an experimental TC (PDB:6BOY), which Zeidman et al. used as a training example for hyper-parameter tuning of PRosettaC.
Fig. 2.
Example of a PROTAC screening pipeline. Gray boxes: experimental steps/input data, pink boxes: steps performed in-silico. The Protein of interest (POI), a suitable E3-ligase, as well as molecular fragments binding to POI and ligase, respectively are provided as input. The PROTAC molecules are then generated by combining these fragments with a linker, taken from an existing library, or designed by a generative algorithm. The latter can be trained to produce compounds with a given range of properties, including geometric features (e.g. linker length, rigidity), various phys-chem properties (e.g. solbulitliy, permeability), and/or various other features relevant in drug design (e.g., toxicity, synthesizability, etc.). The resulting PROTAC library can then be screened for the expected ternary complex stability using an algorithm for ternary complex structure prediction, and ranking of complexes, as discussed in the text.
Irrespective of the problems specified above, we expect such methods to provide valuable insights. With more experimental data for validation becoming available, methods can be improved, and eventually this should allow us to shift a good portion of the work involved in early stage PROTAC design from the lab to the computer, and accelerate drug development accordingly.
5. Discussion
An incomplete list of selected models, trying to summarize the variety of approaches discussed here is provided in Table 1. Although it is early days the use of in-silico models for an estimation of properties like solubility and permeability appears to provide results, that are at least accurate enough for early stage screening of compound libraries. Regarding the generation of such libraries a number algorithms for de-novo design are available. Ternary complex prediction, followed by an application of simple and fast scoring functions, the equivalent of docking and scoring in for TSMDs, to rank PROTACs with respect to the stability of the ternary complex, can rarely provide reliable results in terms of structures that are sufficiently close to experiment. This certainly applies when we argue that only the best scoring pose (as opposed to, e.g., one of the 10 best scoring poses) is relevant in practice. A good number of algorithms have been designed and tested for the purpose of refinement of docking results for binary or ternary protein complexes, and some of the results do look promising, suggesting that, using refinement, one can achieve success rates (in terms of generating near native poses) of up to 75% in favorable cases. This is still far from being perfect, but better than results that are typically obtained when using docking/scoring of static structures only. One has to keep in mind though that most of the refinement methods discussed here require computational resources that only allow for a screening of hundreds, perhaps a few thousand, compounds with state of the art hardware and software.
Table 1.
A selection of in-silico tools and algorithms discussed in the text. More elaborate physics based ranking methods for proteincomplex structures are not included here, but discussed in the text (Section 4.3).
Comment | ref |
---|---|
Linker design | |
DeLinker, generative model | [12] |
SynthaLinker, generative model | [13] |
PROTAC-RL, generative model and reinforcement learning | [14] |
Solubility | |
11 BRo5 drugs, maximum molecular 3D polar surface area | [28] |
FEP/ML based, TSMDs only | [30] |
Permeability | |
24 bRo5 cmpds, minimum 3D-PSA | [28] |
60 macroscycles, LR & RF models, 2D and 3D | [33] |
SA from MD in chloroform, only 3 cmpds | [34] |
ML based model applied to PROTACs | [35] |
macro-cycles, transfer free energies form MD | [36] |
PROTAC design & filtering | |
DeepPROTAC, ML based, limited to bound structures | [71] |
full design pipeline, ML and MD simulation | [14] |
Protein complex structure prediction | |
MOE based workflow, bound input | [38], [74] |
PRosettaC, based on ROSETTA, bound input | [40] |
Frodock + refinement uses unbound input | [42] |
BO based sampling, short MD based refinement | [43] |
Complex pose ranking and stability prediction | |
Short explicit solvent MD, raised temperature, only 2 cases | [48] |
Short US MD w/ constraints, implicit solvent | [50] |
REMD, in explicit solvent, 30 cases | [52] |
GaMD simulations, 3 cases | [55] |
Fast graph neural network model | [59] |
Short MD, pose occupancy time | [60] |
PROTAC design pipeline, including liker generation & scoring | [14] |
One potential issue rarely discussed in the literature about (binary or ternary) protein complex prediction is the fact that crystal structures might be sub-optimal as reference when validating methods. It is not clear how much crystal structures of proteins can differ from the corresponding structures of the protein in solution, and only the latter is usually of interest in the pharmaceutical context. For structures of single proteins, and if we ignore special cases like trans-membrane proteins, in our experience these differences are usually small, and confined to solvent exposed side chains. For protein complexes the situation is less clear, in particular for comparatively weakly bound complexes. For PROTAC based ternary complexes the situation is probably even worse, since here we are usually facing two proteins that were not designed (by evolution) to bind to one another. This fact, in combination with the presence of the, often very flexible, PROTAC linker, might result in ternary complexes that do not have a single well defined structure, but would be better represented by dynamic ensemble of structures. Based on this hypothesis, we expect that MD based methods, employing enhanced sampling algorithms, as discussed in section 4.2, might be required to obtain structures that resemble in-vivo ternary complexes, so as to be of any use in structure based drug design. At the time of writing neither NMR nor cryo-EM structures of PROTAC based ternary complexes are available. Such experimental results would certainly be very useful, as they could help to verify (or falsify) the hypothesis mentioned here.
By the same token, the task of ranking TC stability for different PROTACs, might also provide sub-optimial results when using only static structures. As discussed above in Section 4.3, methods based on advanced MD simulation algorithms, that can predict absolute binding affinities, do exist. Although work in this field has, nearly exclusively been confined to binary PP complexes, the generalization of these methods to TCs and PROTACs is probably straight forward as most of them are based on physics, rather than empirical/statistical scoring functions that have been trained on a particular set of compounds, and therefore have limited transferability. However, most of these methods are computationally expensive, to an extent that precludes their use for high throughput virtual screening. The methods for enhanced sampling discussed here cover only a fraction of the numerous algorithms that have been proposed for this purpose in recent years, as we focus on their application for refinement and scoring of protein-protein complexes. Generally the development of enhanced sampling algorithms is a field of active research, and It might be interesting to consider new/alternative methods for PROTAC design.
Another alternative for ranking PROTACs according to their efficacy are ML based methods, as discussed in Section 4.3.2. We believe that currently the amount of experimental data that is available for training and testing of such algorithms is insufficient to obtain models that can provide reliable predictions for a wide range of POIs. However, with increasing numbers of PROTAC based drugs approaching the clinic, this situation might improve in the foreseeable future. Also the development of improved ML algorithms designed to work with scarce data, might help to obtain more reliable results. The prospect of getting ML algorithms to work with sufficient accuracy is certainly very attractive. Once trained, ML algorithms are usually fast, and would therefore allow for the kind of high throughput virtual screening that has been successfully applied for TSMDs for some time, with the potential for a noticeable acceleration of early stage drug design in the field of PROTAC development.
6. Summary
We find that for the generative de-novo design of PROTAC like molecules, focusing on their linkers, a number of interesting algorithms have been proposed. If, as in the case of linkINVENT, such algorithms include molecular descriptors in their scoring functions that can be used to estimate the drug-likeness of candidate molecules, they appear to represent a powerful tool for the efficient generation of molecular libraries. Although little, in terms of reports of their application in real-live drug development campaigns, has been published, we expect them to be a very useful contribution to the CADD arsenal in PROTAC development.
Also, the calculation of descriptors relating to absorption (solubility, and permeability) for PROTAC like drug candidates should provide results with an accuracy comparable to TSMDs. In particular ML based approaches for this purpose provide estimates sufficiently accurate for early stage screening, combined with a speed that allows application to large libraries of molecules even with moderate computational resources. As for descriptors and models dealing with distribution, metabolism, excretion and toxicity (the remainder of ADMET), little has been published to date that would be applicable specifically to PROTACs and similar compounds. The same is true for quantities like synthesizability, or chemical stability, although here we expect that methods, developed and validated for TSMDs, are likely to provide useful answers for PROTACs as well.
For ternary complex prediction, and scoring of the resulting poses, the equivalent of docking/scoring approaches for TSMDs, it is early days. Methods and algorithms that have been proposed are either very approximate, or expensive in terms of the required computational resources. On top of that, and due to the limited availability of experimental data, both, in terms of structures and activities, many of these algorithms have only been validated using rather small test sets. We believe that more work is required to improve and validate these methods.
An interesting avenue appears to be the combination of generative linker design with fast ML based estimation of descriptors related to bio-availability, and possibly other ADMET properties, to generate small targeted libraries for further exploration, experimentally, and/or in-silico (as in the case of Zheng et al [14]).
Notwithstanding the problems many of the methods discussed here face (not enough and/or unbalanced data for training and testing, slow convergence, etc.), some progess has been made. We expect that the arsenal of in-silico algorithms currently available can already provide useful tools to accelerate PROTAC development, especially in the ligand based drug design field. The development of new algorithms for structure based approaches is an an active field of research, with a potential to allow for a sufficiently efficient and accurate estimation of relative ternary complex stabilities in the forseeable future.
CRediT authorship contribution statement
Tin M. Tunjic: Writing – original draft, Writing – review & editing. Noah Weber: Writing – original draft. Michael Brunsteiner: Conceptualization, Supervision, Writing – original draft, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Sakamoto K.M., Kim K.B., Kumagai A., Mercurio F., Crews C.M., Deshaies R.J. Protacs: chimeric molecules that target proteins to the Skp1-Cullin-F box complex for ubiquitination and degradation. Proc Natl Acad Sci USA. 2001;98(15):8554–8559. doi: 10.1073/pnas.141230798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.M. Békés, D.R. Langley, C.M. Crews, PROTAC targeted protein degraders: the past is prologue, Nature Reviews Drug Discovery 0123456789, iSBN: 0123456789(Jan. 2022). [DOI] [PMC free article] [PubMed]
- 3.Gao H., Sun X., Rao Y. PROTAC technology: opportunities and challenges. ACS Med Chem Lett. 2020;11(3):237–240. doi: 10.1021/acsmedchemlett.9b00597. iSBN: 0000000243568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Troup R.I., Fallan C., Baud M.G.J. Current strategies for the design of PROTAC linkers: a critical review. Explor Target Anti-Tumor Ther. 2020;1(5):273–312. doi: 10.37349/etat.2020.00018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gironda-Martínez A., Donckele E.J., Samain F., Neri D. DNA-encoded chemical libraries: a comprehensive review with succesful stories and future challenges. ACS Pharmacol Transl Sci. 2021;4(4):1265–1279. doi: 10.1021/acsptsci.1c00118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lyu J., Wang S., Balius T.E., Singh I., Levit A., Moroz Y.S., et al. Ultra-large library docking for discovering new chemotypes. Nature. 2019;566(7743):224–229. doi: 10.1038/s41586-019-0917-9. number: 7743 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Segler M.H.S., Kogej T., Tyrchan C., Waller M.P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 2018;4(1):120–131. doi: 10.1021/acscentsci.7b00512. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wang J., Ge Y., Xie X.-Q. Development and testing of druglike screening libraries. J Chem Inf Model. 2019;59(1):53–65. doi: 10.1021/acs.jcim.8b00537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.PROTACpedia 2023.
- 10.Weng G., Cai X., Cao D., Du H., Shen C., Deng Y., et al. PROTAC-DB 2.0: an updated database of PROTACs. Nucleic Acids Res. 2022:gkac946. doi: 10.1093/nar/gkac946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Brown N., Fiscato M., Segler M.H., Vaucher A.C. GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model. 2019;59(3):1096–1108. doi: 10.1021/acs.jcim.8b00839. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 12.Imrie F., Bradley A.R., van der Schaar M., Deane C.M. Deep generative models for 3D linker design. J Chem Inf Model. 2020;60(4):1983–1995. doi: 10.1021/acs.jcim.9b01120. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yang Y., Zheng S., Su S., Zhao C., Xu J., Chen H. Vol. 11. 2020. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks; pp. 8312–8322. (Chemical Science). (publisher: The Royal Society of Chemistry) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zheng S., Tan Y., Wang Z., Li C., Zhang Z., Sang X., et al. Accelerated rational PROTAC design via deep learning and molecular simulations. Nat Mach Intell. 2022;4(9):739–748. number: 9 Publisher: Nature Publishing Group. [Google Scholar]
- 15.J. Guo , F. Knuth , C. Margreitter , J.P. Janet , K. Papadopoulos , O. Engkvist , et al. , Link-INVENT: Generative Linker Design with Reinforcement Learning, 2022.
- 16.Blaschke T., Arús-Pous J., Chen H., Margreitter C., Tyrchan C., Engkvist O., et al. REINVENT 2.0: an AI tool for de novo drug design. J Chem Inf Model. 2020;60(12):5918–5922. doi: 10.1021/acs.jcim.0c00915. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 17.D. Nori, C.W. Coley, R. Mercado, De novo PROTAC design using graph-based deep generative models, arXiv:2211.02660 [cs, q-bio] (Nov. 2022).
- 18.S.R. Atance, J.V. Diez, O. Engkvist, S. Olsson, R. Mercado, De novo drug design using reinforcement learning with graph-based deep generative models (Jul. 2021. [DOI] [PubMed]
- 19.R. Mercado , T. Rastemo , E. Lindelöf , G. Klambauer , O. Engkvist , H. Chen , et al. , Graph Networks for Molecular Design, 2020.
- 20.Lipinski C.A., Lombardo F., Dominy B.W., Feeney P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 2001;46(1):3–26. doi: 10.1016/s0169-409x(00)00129-0. [DOI] [PubMed] [Google Scholar]
- 21.Doak B.C., Zheng J., Dobritzsch D., Kihlberg J. How beyond rule of 5 drugs and clinical candidates bind to their targets. J Kihlberg, How Beyond. 2016;59(6):2312–2327. doi: 10.1021/acs.jmedchem.5b01286. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 22.Doak B.C., Over B., Giordanetto F., Kihlberg J. Oral druggable space beyond the rule of 5: insights from drugs and clinical candidates. Chem Biol. 2014;21(9):1115–1142. doi: 10.1016/j.chembiol.2014.08.013. [DOI] [PubMed] [Google Scholar]
- 23.Egbert M., Whitty A., Keserű G.M., Vajda S. Why some targets benefit from beyond rule of five drugs. J Med Chem. 2019;62(22):10005–10025. doi: 10.1021/acs.jmedchem.8b01732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bergström C.A.S., Charman W.N., Porter C.J.H. Computational prediction of formulation strategies for beyond-rule-of-5 compounds. Adv Drug Deliv Rev. 2016;101:6–21. doi: 10.1016/j.addr.2016.02.005. [DOI] [PubMed] [Google Scholar]
- 25.Cantrill C., Chaturvedi P., Rynn C., PetrigSchaffland J., Walter I., Wittwer M.B. Fundamental aspects of DMPK optimization of targeted protein degraders. Drug Discov Today. 2020;25(6):969–982. doi: 10.1016/j.drudis.2020.03.012. [DOI] [PubMed] [Google Scholar]
- 26.Pike A., Williamson B., Harlfinger S., Martin S., McGinnity D.F. Optimising proteolysis-targeting chimeras (PROTACs) for oral drug delivery: a drug metabolism and pharmacokinetics perspective. Drug Discov Today. 2020;25(10):1793–1800. doi: 10.1016/j.drudis.2020.07.013. (publisher: Elsevier Ltd) [DOI] [PubMed] [Google Scholar]
- 27.García Jiménez D., Rossi Sebastiano M., Vallaro M., Mileo V., Pizzirani D., Moretti E., et al. Designing soluble PROTACs: strategies and preliminary guidelines. J Med Chem. 2022;65(19):12639–12649. doi: 10.1021/acs.jmedchem.2c00201. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.RossiSebastiano M., Doak B.C., Backlund M., Poongavanam V., Over B., Ermondi G., et al. Impact of dynamically exposed polarity on permeability and solubility of chameleonic drugs beyond the rule of 5. J Med Chem. 2018;61(9):4189–4202. doi: 10.1021/acs.jmedchem.8b00347. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 29.Bergström C.A.S., Wassvik C.M., Johansson K., Hubatsch I. Poorly soluble marketed drugs display solvation limited solubility. J Med Chem. 2007;50(23):5858–5862. doi: 10.1021/jm0706416. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 30.Scheen J., Wu W., Mey A.S.J.S., Tosco P., Mackey M., Michel J. Hybrid alchemical free energy/machine-learning methodology for the computation of hydration free energies. J Chem Inf Model. 2020;60(11):5331–5339. doi: 10.1021/acs.jcim.0c00600. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 31.Matsson P., Kihlberg J. How big is too big for cell permeability? J Med Chem. 2017;60(5):1662–1664. doi: 10.1021/acs.jmedchem.7b00237. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 32.Pye C.R., Hewitt W.M., Schwochert J., Haddad T.D., Townsend C.E., Etienne L., et al. Nonclassical size dependence of permeation defines bounds for passive adsorption of large drug molecules. J Med Chem. 2017;60(5):1665–1672. doi: 10.1021/acs.jmedchem.6b01483. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Poongavanam V., Atilaw Y., Ye S., Wieske L.H.E., Erdelyi M., Ermondi G., et al. Predicting the permeability of macrocycles from conformational sampling – limitations of molecular flexibility. J Pharm Sci. 2021;110(1):301–313. doi: 10.1016/j.xphs.2020.10.052. (publisher: Elsevier) [DOI] [PubMed] [Google Scholar]
- 34.Poongavanam V., Atilaw Y., Siegel S., Giese A., Lehmann L., Meibom D., et al. Linker-dependent folding rationalizes PROTAC cell permeability. J Med Chem. 2022;65(19):13029–13040. doi: 10.1021/acs.jmedchem.2c00877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.V. Poongavanam , F. Kölling , A. Giese , A.H. Göller , L. Lehmann , D. Meibom , et al. , Predictive Modeling of PROTAC Cell Permeability with Machine Learning, 2022. [DOI] [PMC free article] [PubMed]
- 36.Kamenik A.S., Kraml J., Hofer F., Waibl F., Quoika P.K., Kahler U., et al. Macrocycle cell permeability measured by solvation free energies in polar and apolar environments. J Chem Inf Model. 2020;60(7):3508–3517. doi: 10.1021/acs.jcim.0c00280. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Thai N.Q., Theodorakis P.E., Li M.S. Fast estimation of the blood–brain barrier permeability by pulling a ligand through a lipid membrane. J Chem Inf Model. 2020;60(6):3057–3067. doi: 10.1021/acs.jcim.9b00834. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Drummond M.L., Williams C.I. In Silico modeling of PROTAC-mediated ternary complexes: validation and application. J Chem Inf Model. 2019;59(4):1634–1644. doi: 10.1021/acs.jcim.8b00872. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 39.Molecular Operating Environment (2022).
- 40.Zaidman D., Prilusky J., London N. ProsetTac: rosetta based modeling of PROTAC mediated ternary complexes. J Chem Inf Model. 2020;60(10):4894–4903. doi: 10.1021/acs.jcim.0c00589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Vekilov P.G., Chung S., Olafson K.N. Shape change in crystallization of biological macromolecules. MRS Bull. 2016;41(5):375–380. (publisher: Cambridge University Press) [Google Scholar]
- 42.Weng G., Li D., Kang Y., Hou T. Integrative modeling of PROTAC-mediated ternary complexes. J Med Chem. 2021;64(21):16271–16281. doi: 10.1021/acs.jmedchem.1c01576. [DOI] [PubMed] [Google Scholar]
- 43.A. Rao, T.M. Tunjic, M. Brunsteiner, M. Müller, H. Fooladi, N. Weber, Bayesian Optimization for Ternary Complex Prediction (BOTCP), pages: 2022.06.03.494737 Section: New Results (Jun. 2022).
- 44.L. Han, Q. Yang, Z. Liu, Y. Li, R. Wang, Development of a new benchmark for assessing the scoring functions applicable to protein–protein interactions, archive Location: London, UK Publisher: Future Science Ltd London, UK (Jun. 2018). [DOI] [PubMed]
- 45.Li H., Yan Y., Zhao X., Huang S.-Y. Inclusion of desolvation energy into protein–protein docking through atomic contact potentials. J Chem Inf Model. 2022;62(3):740–750. doi: 10.1021/acs.jcim.1c01483. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 46.Hou Q., Lensink M.F., Heringa J., Feenstra K.A. CLUB-MARTINI: selecting favourable interactions amongst available candidates, a coarse-grained simulation approach to scoring docking decoys. PLOS ONE. 2016;11(5) doi: 10.1371/journal.pone.0155251. (publisher: Public Library of Science) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zacharias M. Protein-protein docking refinement using restraint molecular dynamics simulations. TASK Q. 2016;20(4):353–360. number: 4. [Google Scholar]
- 48.Radom F., Plückthun A., Paci E. Assessment of ab initio models of protein complexes by molecular dynamics. PLOS Comput Biol. 2018;14(6) doi: 10.1371/journal.pcbi.1006182. (publisher: Public Library of Science) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Shinobu A., Takemura K., Matubayasi N., Kitao A. Refining evERdock: improved selection of good protein-protein complex models achieved by MD optimization and use of multiple conformations. J Chem Phys. 2018;149(19) doi: 10.1063/1.5055799. (publisher: AIP Publishing LLCAIP Publishing) [DOI] [PubMed] [Google Scholar]
- 50.Siebenmorgen T., Zacharias M. Evaluation of predicted protein-protein complexes by binding free energy simulations. J Chem Theory Comput. 2019;15(3):2071–2086. doi: 10.1021/acs.jctc.8b01022. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 51.Siebenmorgen T., Engelhard M., Zacharias M. Prediction of protein–protein complexes using replica exchange with repulsive scaling. J Comput Chem. 2020;41(15):1436–1447. doi: 10.1002/jcc.26187. [DOI] [PubMed] [Google Scholar]
- 52.Siebenmorgen T., Zacharias M. Efficient refinement and free energy scoring of predicted protein–protein complexes using replica exchange with repulsive scaling. J Chem Inf Model. 2020;60(11):5552–5562. doi: 10.1021/acs.jcim.0c00853. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 53.Scafuri N., Soler M.A., Spitaleri A., Rocchia W. Enhanced molecular dynamics method to efficiently increase the discrimination capability of computational protein–protein docking. J Chem Theory Comput. 2021;17(11):7271–7280. doi: 10.1021/acs.jctc.1c00789. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dominguez C., Boelens R., Bonvin A.M.J.J. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003;125(7):1731–1737. doi: 10.1021/ja026939x. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 55.Wang J., Alekseenko A., Kozakov D., Miao Y. Improved modeling of peptide-protein binding through global docking and accelerated molecular dynamics simulations. Front Mol Biosci. 2019;6 doi: 10.3389/fmolb.2019.00112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Perthold J.W., Oostenbrink C. GroScore: accurate scoring of protein–protein binding poses using explicit-solvent free-energy calculations. J Chem Inf Model. 2019;59(12):5074–5085. doi: 10.1021/acs.jcim.9b00687. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 57.Liao J., Nie X., Unarta I.C., Ericksen S.S., Tang W. In Silico modeling and scoring of PROTAC-mediated ternary complex poses. J Med Chem. 2022;65(8):6116–6132. doi: 10.1021/acs.jmedchem.1c02155. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]
- 58.Kozakov D., Hall D.R., Xia B., Porter K.A., Padhorny D., Yueh C., et al. The ClusPro web server for protein–protein docking. Nat Protoc. 2017;12(2):255–278. doi: 10.1038/nprot.2016.169. number: 2 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wang X., Flannery S.T., Kihara D. Protein docking model evaluation by graph. Neural Netw, Front Mol Biosci. 2021;8(May):1–13. doi: 10.3389/fmolb.2021.647915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Jandova Z., Vargiu A.V., Bonvin A.M. Native or non-native protein-protein docking models? Molecular dynamics to the rescue. J Chem Theory Comput. 2021;17(9):5944–5954. doi: 10.1021/acs.jctc.1c00336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Siebenmorgen T., Zacharias M. Computational prediction of protein–protein binding affinities. WIREs Comput Mol Sci. 2020;10(3) doi: 10.1002/wcms.1448. [DOI] [PubMed] [Google Scholar]
- 62.Cuendet M.A., Michielin O. Protein-protein interaction investigated by steered molecular dynamics: the TCR-pMHC complex. Biophys J. 2008;95(8):3575–3590. doi: 10.1529/biophysj.108.131383. (publisher: Elsevier) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Gumbart J.C., Roux B., Chipot C. Efficient determination of protein–protein standard binding free energies from first principles. J Chem Theory Comput. 2013;9(8):3789–3798. doi: 10.1021/ct400273t. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Rodriguez R.A., Yu L., Chen L.Y. Computing protein-protein association affinity with hybrid steered molecular dynamics. J Chem Theory Comput. 2015;11(9):4427–4438. doi: 10.1021/acs.jctc.5b00340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Pan A.C., Jacobson D., Yatsenko K., Sritharan D., Weinreich T.M., Shaw D.E. Atomic-level characterization of protein–protein association. Proc Natl Acad Sci USA. 2019;116(10):4244–4249. doi: 10.1073/pnas.1815431116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Perthold J.W., Oostenbrink C. Simulation of reversible protein–protein binding and calculation of binding free energies using perturbed distance restraints. J Chem Theory Comput. 2017;13(11):5697–5708. doi: 10.1021/acs.jctc.7b00706. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Paul F., Wehmeyer C., Abualrous E.T., Wu H., Crabtree M.D., Schöneberg J., et al. Protein-peptide association kinetics beyond the seconds timescale from atomistic simulations. Nat Commun. 2017;8(1):1095. doi: 10.1038/s41467-017-01163-6. number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Plattner N., Doerr S., De Fabritiis G., Noé F. Complete protein–protein association kinetics in atomic detail revealed by molecular dynamics simulations and Markov modelling. Nat Chem. 2017;9(10):1005–1011. doi: 10.1038/nchem.2785. number: 10 Publisher: Nature Publishing Group. [DOI] [PubMed] [Google Scholar]
- 69.JostLopez A., Quoika P.K., Linke M., Hummer G., Köfinger J. Quantifying protein–protein interactions in molecular simulations. J Phys Chem B. 2020;124(23):4673–4685. doi: 10.1021/acs.jpcb.9b11802. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wang J., Miao Y. Protein–protein interaction-gaussian accelerated molecular dynamics (PPI-GaMD): characterization of protein binding thermodynamics and kinetics. J Chem Theory Comput. 2022;18(3):1275–1285. doi: 10.1021/acs.jctc.1c00974. (publisher: American Chemical Society) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Li F., Hu Q., Zhang X., Sun R., Liu Z., Wu S., et al. DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs. Nat Commun. 2022;13(1):7133. doi: 10.1038/s41467-022-34807-3. number: 1 Publisher: Nature Publishing Group. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bai N., Riching K.M., Makaju A., Wu H., Acker T.M., Ou S.-C., et al. Modeling the CRL4A ligase complex to predict target protein ubiquitination induced by cereblon-recruiting PROTACs. J Biol Chem. 2022;298(4) doi: 10.1016/j.jbc.2022.101653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ramachandran S., Ciulli A. Building ubiquitination machineries: E3 ligase multi-subunit assembly and substrate targeting by PROTACs and molecular glues. Curr Opin Struct Biol. 2021;67:110–119. doi: 10.1016/j.sbi.2020.10.009. [DOI] [PubMed] [Google Scholar]
- 74.Drummond M.L., Henry A., Li H., Williams C.I. Improved accuracy for modeling PROTAC-mediated ternary complex formation and targeted protein degradation via new in silico methodologies. J Chem Inf Model. 2020;60(10):5234–5254. doi: 10.1021/acs.jcim.0c00897. (publisher: American Chemical Society) [DOI] [PubMed] [Google Scholar]