Abstract
Proteolysis-targeting chimeras (PROTACs) are promising next-generation therapeutics for the degradation of disease-associated proteins. However, optimizing the physicochemical properties of PROTACs, particularly their poor cell membrane permeability, remains challenging. Traditionally, PROTAC linkers have been manually designed to improve cell membrane permeability. Although recent machine learning-based approaches have enabled the rational design of PROTAC linkers, no linker design methods that explicitly address cell membrane permeability have been reported. In this study, we developed PROTAC-TS, a linker generative model that combines a chemical language model and reinforcement learning to control cell membrane permeability. We first constructed a prediction model of cell membrane permeability, which achieved high prediction performance (R 2 = 0.710). By integrating this prediction model into the generative model, we successfully designed linkers of PROTACs with high predicted cell membrane permeability while considering PROTAC likeness. Our results highlight the potential of PROTAC-TS in accelerating PROTAC development with favorable cell membrane permeability.
Keywords: PROTAC, cell membrane permeability, linker design, machine learning, molecular generation


1. Introduction
Proteolysis-targeting chimeras (PROTACs) have garnered significant interest as next-generation therapeutics capable of degrading disease-associated proteins of interest (POIs). − PROTACs are heterobifunctional molecules comprising an E3 ligase-binding moiety (E3 ligand), POI-binding ligand, and linker. This unique architecture enables PROTACs to form degradation-competent ternary complexes with E3 ligases and POIs, thereby catalytically inducing POI degradation by hijacking the ubiquitin–proteasome system (UPS). Furthermore, they have the potential to target disease-associated proteins that lack conventional ligand-binding pockets and have been considered undruggable because PROTACs rely on the induced proximity between a POI and an E3 ligase rather than binding to a functional active site. Currently, several PROTACs, such as ARV-110 and ARV-471, are being evaluated in clinical trials.
Despite their promise, optimizing the physicochemical properties of PROTACs remains a significant challenge, especially their poor cell membrane permeability, which leads to poor bioavailability and an increased risk of preclinical or clinical failure. Cell membrane permeability is one of the factors determining oral bioavailability alongside solubility and first-pass metabolism. Most PROTACs fall beyond the rule of five (bRo5) chemical space because of their larger molecular size compared to that of conventional inhibitors and, consequently, exhibit low cell membrane permeability. − As PROTACs induce protein degradation via intracellular UPS, low cell membrane permeability reduces degradation activity. Therefore, improving the cell membrane permeability of PROTACs is essential for achieving sufficient pharmacological effects. High cell membrane permeability also allows lower doses to achieve effective intracellular concentrations and contributes to a reduced risk of toxicity.
To date, the cell membrane permeability of PROTACs has typically been controlled by optimizing their linkers based on a traditional manual design. Linkers are instrumental in determining the key properties of PROTACs, including cell membrane permeability and protein degradation activity. − For instance, Klein et al. improved both the cell membrane permeability and degradation activity of MZ1 and ARV-771, which are bromodomain and extra-terminal PROTACs, by replacing the amide bonds with esters to reduce the number of hydrogen bond donors and increase hydrophobicity. Abeje et al. designed nine von Hippel-Lindau (VHL)-based PROTACs with different linker structures and evaluated their cell membrane permeability. They found that PROTACs with stable conformations formed through intramolecular interactions, including NH−π and π–π interactions under low-polarity conditions, exhibited higher cell membrane permeability. However, empirical rule-based manual design still requires substantial time and effort for linker optimization, leading to a bottleneck in the PROTAC development cycle. Improving the efficiency of the linker optimization process is expected to shorten the development timeline of PROTACs.
Recently, machine learning (ML)-based linker design methods that account for various molecular properties have been developed. − For instance, Tan et al. developed DRlinker, which incorporates the linker length and logP. Neeser et al. proposed ShapeLinker, which considers the linker length, number of rotatable bonds, and linker conformation within a ternary complex. Li et al. introduced PROTAC-INVENT, which integrates both two-dimensional (2D) features, such as the number of aromatic rings in a linker, and three-dimensional (3D) features, such as the docking scores of ternary complexes. Zheng et al. developed PROTAC-RL, which employs the AbbVie multiparametric score, a metric indicative of oral absorption in the bRo5 space, as a design guideline. Despite these advances, PROTAC linker design methods that explicitly consider cell membrane permeability have yet to be reported.
One possible reason for the absence of such methods is the lack of open-source tools for assessing the cell membrane permeability of PROTACs. Although several previous studies developed ML-based models to predict this property, some limitations remain. For instance, although Poongavanam et al. developed a classification model for the cell membrane permeability of PROTACs, it was trained on an in-house data set and not publicly available. Peteani et al. developed regression models for absorption, distribution, metabolism, excretion, and physicochemical properties, including the cell membrane permeability of PROTACs (low-efflux Madin–Darby canine kidney cell line P app, R = 0.971), and provided access to the source code; however, the experimental data used for model training were not disclosed, making it impossible to reproduce the prediction performance reported in their study. Thus, the lack of accessible and high-performance tools for evaluating the cell membrane permeability of PROTACs may constitute a key bottleneck in the development of linker design methods that account for this property. Although public databases for PROTACs, such as PROTAC-DB 3.0, have been established, their small data set sizes make it difficult to handle extrapolated data when used for ML-based model development.
In this study, we developed a PROTAC linker design method called PROTAC-TS, which considers cell membrane permeability. To evaluate the cell membrane permeability of PROTACs, we first constructed a prediction model based on PROTAC-DB 3.0. The prediction model achieved high prediction performance (coefficient of determination (R 2) = 0.710, Pearson correlation coefficient (R) = 0.846, and root-mean-square error (RMSE) = 0.517) and is publicly available. Subsequently, we developed PROTAC-TS by implementing a linker generation function in ChemTSv2, a de novo molecule generator based on reinforcement learning, and integrated the prediction model to modulate cell membrane permeability. PROTAC-TS can be used to design PROTAC linkers while considering PROTAC likeness, including structural stability, synthetic accessibility, structural branching, and linker length, by applying filters. Furthermore, to address the problem of extrapolated data, PROTAC-TS considers the applicability domain (AD) of the prediction model during linker design, thereby ensuring reliable predictive performance by focusing on data close to the training data. − PROTAC-TS successfully designed linkers for PROTACs with high predicted cell membrane permeability for several POI–E3 ligand pairs. It also reproduced a linker for a PROTAC with high cell membrane permeability, as listed in PROTAC-DB 3.0, and the linker of KT-474, a PROTAC that has reached phase 2 clinical trials. Additionally, experimental evaluations confirmed the high cell membrane permeability of PROTACs containing linkers designed by PROTAC-TS. These results indicate the potential of PROTAC-TS to accelerate the development of PROTACs with improved cell membrane permeability. The source code is available at https://github.com/ycu-iil/PROTAC-TS.
2. Results and Discussion
Initially, we constructed prediction models for cell membrane permeability of PROTACs by using a tabular prior-data fitted network (TabPFN). Subsequently, we developed PROTAC-TS by integrating ChemTSv2 with the best-performing prediction model (Figure ) and applied the method to design PROTAC linkers. The adaptability and effectiveness of PROTAC-TS were further confirmed by reproducing the linker of a PROTAC with high cell membrane permeability listed in PROTAC-DB 3.0 and the linkers of PROTACs that have entered clinical trials. Finally, we experimentally validated the cell membrane permeability of several designed PROTACs.
1.
Overview of the PROTAC-TS workflow. The workflow comprises two main modules: “Linker design” and “Linker evaluation”. In the “Linker design” module, a generative model, which was trained on the PROTAC-DB 3.0 linker data set, generates SMILES for linkers that connect a selected POI and E3 ligand. In the “Linker evaluation” module, the generated linkers are first passed through predefined filters. The cell membrane permeability of the filtered linkers is then predicted using a prediction model trained on the PROTAC-DB 3.0 data set. A reward is calculated based on the predicted cell membrane permeability and fed back to the “Linker design” module. By repeating this cycle, PROTAC-TS designs linkers of PROTACs with favorable cell membrane permeability.
2.1. Construction of Prediction Models for Cell Membrane Permeability
We constructed prediction models for the cell membrane permeability of PROTACs using PROTAC-DB 3.0 (described in Section ). Various features and prediction methods were evaluated (Table ). The features included Mordred descriptors containing only 2D descriptors, Mordred descriptors containing 3D descriptors, bit-based Morgan fingerprints, 2048-dimensional count-based Morgan fingerprints, and 500-dimensional count-based Morgan fingerprints. The prediction methods included TabPFN, light gradient boosting machine (LightGBM), and AutoGluon. Among all the combinations, the prediction model employing the 500-dimensional count-based Morgan fingerprint as the feature and TabPFN as the prediction method achieved the highest prediction performance, as shown in Figure (R 2 = 0.710, R = 0.846, and RMSE = 0.517). Across all methods, the prediction models using Mordred descriptors containing 3D descriptors consistently outperformed those using Mordred descriptors containing only 2D descriptors. For the prediction models based on Morgan fingerprints, both TabPFN and AutoGluon exhibited higher performance using count-based Morgan fingerprints than bit-based Morgan fingerprints. This result likely reflects the ability of count-based Morgan fingerprints to capture repetitive structural patterns frequently found in linkers containing polyethylene glycol or alkane moieties. Based on thesed results, we constructed a prediction model using a combination of features and methods that achieved the best performance. The model, trained on 43 compounds from PROTAC-DB 3.0, was employed to evaluate the cell membrane permeability of PROTACs to guide linker design.
1. Performance of the Prediction Models for Caco-2 Cell Membrane Permeability .
| model | feature | dimension | R 2 | R | RMSE |
|---|---|---|---|---|---|
| TabPFN | Mordred (2D) | 1422 | 0.617 | 0.788 | 0.594 |
| TabPFN | Mordred (2D and 3D) | 1473 | 0.636 | 0.802 | 0.579 |
| TabPFN | Morgan (bit-based) | 2048 | 0.646 | 0.808 | 0.571 |
| TabPFN | Morgan (count-based) | 2048 | 0.661 | 0.816 | 0.559 |
| TabPFN | Morgan (count-based) | 500 | 0.710 | 0.846 | 0.517 |
| LightGBM | Mordred (2D) | 1422 | 0.388 | 0.662 | 0.751 |
| LightGBM | Mordred (2D and 3D) | 1473 | 0.426 | 0.679 | 0.727 |
| LightGBM | Morgan (bit-based) | 2048 | 0.605 | 0.787 | 0.603 |
| LightGBM | Morgan (count-based) | 2048 | 0.599 | 0.776 | 0.608 |
| LightGBM | Morgan (count-based) | 500 | 0.649 | 0.814 | 0.568 |
| AutoGluon | Mordred (2D) | 1422 | 0.459 | 0.697 | 0.706 |
| AutoGluon | Mordred (2D and 3D) | 1473 | 0.557 | 0.749 | 0.639 |
| AutoGluon | Morgan (bit-based) | 2048 | 0.339 | 0.626 | 0.780 |
| AutoGluon | Morgan (count-based) | 2048 | 0.484 | 0.716 | 0.689 |
| AutoGluon | Morgan (count-based) | 500 | 0.548 | 0.750 | 0.645 |
The models were developed to predict the log10-transformed experimental A-to-B permeability using PROTAC SMILES as input. Values shown in bold indicate the best-performing results.
2.

Scatterplot showing predicted vs actual cell membrane permeability. The points represent the mean of the predictive distribution from the best-performing prediction model (R 2 = 0.710, R = 0.846, and RMSE = 0.517). The upper and lower error bars correspond to the 70th and 30th percentiles of the predictive distribution, respectively.
2.2. Computational Validation for PROTAC-TS
2.2.1. PROTAC Linker Design Based on Bromodomain-Containing Protein 4 (BRD4) and VHL Ligands
PROTAC linkers were designed to improve the cell membrane permeability of PROTACs based on BRD4 and VHL ligands. , PROTAC-TS designs linkers using a recurrent neural network (RNN)-based linker generator in combination with chemical space exploration via Monte Carlo Tree Search (MCTS), as described in Section . The RNN-based linker generator was trained on the PROTAC linkers collected from PROTAC-DB 3.0. The BRD4 and VHL ligands, both commonly used in PROTAC design, were selected as the POI and E3 ligands for linker design, respectively, as shown in Figure a. PROTAC-TS incorporates various filters during linker design to consider PROTAC likeness. In this study, we examined three filtering conditions: relaxed, intermediate, and strict. The relaxed condition was applied only to the minimal filters. The intermediate condition included additional filters that excluded structurally unfavorable linkers, including those that were unstable, synthetically challenging, or excessively branched. The strict condition included further filters based on the AD of the prediction model, the structural similarity to the PROTAC linkers in PROTAC-DB 3.0, and linker length. Under the relaxed condition, the reward values improved progressively during the design process, as shown in Figure b. However, many of the designed linkers contained unstable or synthetically challenging substructures, as shown in Figure S1. Under the intermediate condition, PROTAC-TS designed PROTAC linkers with fewer unstable, synthetically challenging, or excessively branched substructures, as shown in Figure S2. Under both the relaxed and intermediate conditions, structural complexity metrics, such as linker length, the number of heteroatoms, the number of rotatable bonds, and the number of ring structures, increased with the linker generation, as shown in Figure S3. Under the strict condition, the designed linkers exhibited a lower incidence of undesired structures and greater similarity to the training data, as shown in Figures c and S4. This condition suppressed the increase in structural complexity metrics during the generation process, as shown in Figure S3. These results demonstrate that PROTAC-TS enables the control of the chemical space of the designed linkers by appropriately adjusting the filtering conditions.
3.
Results of linker design based on the BRD4 and VHL ligands. (a) Selected ligands. The yellow and green circles indicate the junction points with a linker. (b) Reward trends during linker generation. The blue, green, and red solid lines represent the moving average (window size = 500 linkers) of rewards from three independent runs under the relaxed, intermediate, and strict conditions, respectively. Shaded areas indicate the standard deviation across the runs. (c) Representative top-ranking linkers designed under the strict condition (described in Section ). Atoms circled in yellow and green correspond to the junction points of the POI and E3 ligands, respectively. Asterisks denote PROTACs with linkers absent from PROTAC-DB 3.0.
2.2.2. PROTAC Linker Design Based on BRD4 and Cereblon (CRBN) Ligands
PROTAC linkers were designed based on the BRD4 and CRBN ligands, as shown in Figure a, using PROTAC-TS. The PROTAC linkers were designed under relaxed, intermediate, and strict filtering conditions (Figures S5–S8). Under the relaxed condition, the reward values improved progressively during the design process, as shown in Figure b. Under the strict condition, PROTAC-TS successfully reproduced the linker of Compound 2895, a PROTAC with high cell membrane permeability listed in PROTAC-DB 3.0, with a high reward value, as shown in Figure c. PROTAC-TS was also used to design other linkers for PROTACs that either achieved high reward values or were structurally similar to Compound 2895, as shown in Figure c,d. The experimental value of the cell membrane permeability of Compound 2895, as recorded in PROTAC-DB 3.0, was 2.14 μcm/s (log10 scale: 0.3304), whereas the predicted value was 0.91 μcm/s (log10 scale: −0.0402). The difference between the experimental and predicted values fell within the RMSE of the prediction model, suggesting that the prediction model retained a reasonable level of prediction performance, even for PROTACs not included in the training data set. These results indicated that PROTAC-TS enables the generation of linkers for PROTACs with high cell membrane permeability that are not present in the training data set and shows potential for designing novel and promising PROTAC linkers.
4.
Results of linker design based on the BRD4 and CRBN ligands. (a) Selected ligands and Compound 2895. The yellow and green circles indicate the junction points with a linker. (b) Reward trends during linker generation. The blue, green, and red solid lines represent the moving average (window size = 500 linkers) of rewards from three independent runs under the relaxed, intermediate, and strict conditions, respectively. Shaded areas indicate the standard deviation across the runs. (c) The linker of Compound 2895 and linkers that showed high similarity to it, designed under the strict condition. Tanimoto similarity (denoted as “Similarity” in the figure) was calculated between Compound 2895 and each PROTAC with a designed linker using Morgan fingerprints, whose radius and dimension were 2 and 2,048, respectively. (d) Representative top-ranking linkers designed under the strict condition (described in Section ). Atoms circled in yellow and green correspond to the junction points of the POI and E3 ligands, respectively. Asterisks denote PROTACs with linkers absent from PROTAC-DB 3.0.
2.3. Experimental Validation of PROTAC-TS
We applied PROTAC-TS to design linkers of PROTACs targeting interleukin-1 receptor-associated kinase 4 (IRAK4), Bruton’s tyrosine kinase (BTK), anaplastic lymphoma kinase (ALK), and BRD4. We selected five representative PROTACs from the designed PROTACs and experimentally evaluated their cell membrane permeability.
2.3.1. IRAK4 PROTAC
Based on the IRAK4 and CRBN ligands, as shown in Figure a, PROTAC linkers were designed using PROTAC-TS under relaxed, intermediate, and strict filtering conditions (Figures S9–S12). Under the relaxed condition, reward values improved progressively during the design process, as shown in Figure b. Under the strict condition, PROTAC-TS successfully reproduced the linker of KT-474, excluding stereochemical information, as shown in Figure c. The experimental result for the cell membrane permeability of KT-474 is described in Section . In addition, linkers with higher reward values were designed, as shown in Figure d. Although the training data for the prediction model and the RNN-based linker generator did not include KT-474 and its linker, respectively, PROTAC-TS reproduced the KT-474 linker and designed multiple structurally similar linkers, as shown in Figure c, supporting the validity and generalizability of this method. The results presented herein highlight the potential of PROTAC-TS to design linkers for PROTACs with favorable cell membrane permeability comparable to those in clinical development and demonstrate the promise of PROTAC-TS as a valuable method for future PROTAC development.
5.
Results of linker design based on the IRAK4 and CRBN ligands. (a) Selected ligands and KT-474. The yellow and green circles indicate the junction points with a linker. (b) Reward trends during linker generation. The blue, green, and red solid lines represent the moving average (window size = 500 linkers) of rewards from three independent runs under the relaxed, intermediate, and strict conditions, respectively. Shaded areas indicate the standard deviation across the runs. (c) The linker of KT-474 and linkers that showed high similarity to it, designed under the strict condition. Tanimoto similarity (denoted as “Similarity” in the figure) was calculated between KT-474 and each PROTAC with a designed linker using Morgan fingerprints, whose radius and dimension were 2 and 2,048, respectively. (d) Representative top-ranking linkers designed under the strict condition (described in Section ). Atoms circled in yellow and green correspond to the junction points of the POI and E3 ligands, respectively. Asterisks denote PROTACs with linkers absent from PROTAC-DB 3.0.
2.3.2. BTK PROTAC
For the BTK and CRBN ligands, as shown in Figure a, PROTAC linkers were designed using PROTAC-TS under relaxed, intermediate, and strict filtering conditions (Figures S13 and S14). Under the intermediate condition, reward values improved progressively during the design process, as shown in Figure S14. This result indicates that PROTAC-TS can successfully design PROTAC linkers to improve predicted cell membrane permeability. In contrast, under the relaxed condition, reward values did not improve noticeably during the design process. Linker lengths remained unchanged, while the maximum branch length was relatively high during the design process, as shown in Figure S14. The limited improvement in reward is likely due to an insufficient exploration of longer linkers. This suggests that our method preferentially explored complex branching linkers rather than simple longer linkers under this condition. Under the strict condition, PROTAC-TS successfully reproduced the linker of NX-2127, which is currently undergoing phase I clinical trials. The experimental result for the cell membrane permeability of NX-2127 is described in Section .
6.
Results of linker design for PROTACs targeting (a) BTK, (b) ALK, and (c) BRD4, respectively. The left, middle, and right columns display the selected ligands, examples of the designed linkers under the strict condition, and PROTACs selected for experimental evaluation, respectively. Numerical values indicate reward scores. Atoms circled in yellow and green correspond to the junction points of the POI and E3 ligands, respectively. Asterisks denote PROTACs with linkers absent from PROTAC-DB 3.0.
2.3.3. ALK PROTAC
PROTAC linkers for the ALK and CRBN ligands were designed using PROTAC-TS under relaxed, intermediate, and strict filtering conditions, as shown in Figure b. Similar to the results in Section , reward values improved progressively during the design process under the intermediate condition, whereas no noticeable improvement was observed under the relaxed and strict conditions, as shown in Figure S15. Under the relaxed condition, the limited improvement in reward values is likely attributable to an insufficient exploration of longer linkers. Under the strict condition, although the reward values did not show a marked upward trend, PROTAC-TS successfully designed chemically valid linker structures, as shown in Figure S16. In this study, we experimentally evaluated the cell membrane permeability of MS4078, which contains a linker generated under the strict condition and is commercially available, as described in Section .
2.3.4. BRD4 PROTAC
For the BRD4 ligand and the thalidomide-based CRBN ligand, which is distinct from the one used in Section , as shown in Figure c, PROTAC linkers were designed using PROTAC-TS under relaxed, intermediate, and strict filtering conditions (Figures S17 and S18). Under both the relaxed and intermediate conditions, reward values improved during the early phase of the design process, as shown in Figure S18. In this study, we experimentally evaluated the cell membrane permeability of dBET6 and dBET57, both of which contain linkers generated under the strict condition and are commercially available, as described in Section .
2.3.5. Experimental Evaluation of Generated PROTACs
We experimentally evaluated the cell membrane permeability of KT-474, NX-2127, MS4078, dBET6, and dBET57. Details of the experimental conditions are described in Section . The experimental values are listed in Table S1. Overall, the experimental values tended to be higher than the predicted values. KT-474, dBET6, and dBET57 exhibited relatively high cell membrane permeability. For dBET6 and dBET57, the discrepancies between the predicted and experimental values were relatively small. Although NX-2127 and MS4078 showed comparatively large discrepancies, their experimental values fell within the 10th and 90th percentiles of the predictive distribution. Also, a moderate correlation was observed between the predicted and experimental values (R = 0.732), as shown in Figure S19. These results provide preliminary support for the robustness of the prediction model. In contrast, the experimental value of KT-474 fell outside this predictive percentile range. This range for KT-474 was wider than that for the other PROTACs. Additionally, the maximum Tanimoto similarity of KT-474 to any PROTAC in the training data set was lower than that of the other PROTACs, as shown in Table S2. These observations suggest that accurately estimating the cell membrane permeability of KT-474 is relatively challenging for the current prediction model.
3. Conclusions
In this study, we developed PROTAC-TS, a PROTAC linker design method for improving cell membrane permeability, by integrating a high-performance prediction model for cell membrane permeability. PROTAC-TS enables linker design while accounting for the AD of the prediction model to address the uncertainty stemming from the limited size of the training data set. PROTAC-TS successfully designed linkers of PROTACs with high predicted cell membrane permeability. Notably, this method reproduced the linker of Compound 2895, a PROTAC with high cell membrane permeability in PROTAC-DB 3.0, and the linkers of PROTACs that have entered clinical trials. PROTAC-TS further designed other linkers of PROTACs that were predicted to exhibit high cell membrane permeability. In addition, our results demonstrate that PROTAC linkers designed with high reward values tend to exhibit relatively high flexibility, although some contain ring structures. This trend suggests that PROTAC-TS has the potential to design PROTAC linkers while considering the chameleonicity, which is one possible way to improve cell membrane permeability by adapting molecular conformations in response to the surrounding environment. Furthermore, experimental evaluations confirmed that PROTACs containing linkers designed by PROTAC-TS, such as dBET6 and KT-474, exhibited relatively high cell membrane permeability. In PROTAC-TS, the reward function and filtering conditions can be modified as required. Also, PROTAC-TS has the potential to help medicinal chemists avoid overlooking promising candidates and to provide insights that may inform novel linker design strategies by exploring a diverse range of candidate molecules considering cell membrane permeability and PROTAC likeness. The results presented herein suggest that PROTAC-TS is a promising approach for accelerating the design of PROTACs with favorable cell membrane permeability.
Despite its reported advances, several aspects of this study require further improvement. The first challenge was the size of the data set used to train the prediction model for cell membrane permeability. Experimental data on the cell membrane permeability of PROTACs are scarce. While we addressed the limitation of the small data set using the prediction-model AD filter during linker design in this study, expansion of the data set will be desirable in future studies. The second challenge was an evaluation metric for cell membrane permeability. Although Caco-2 cell-based assays were employed in this study, other assay types such as parallel artificial membrane permeability assays are commonly used for cell membrane permeability assessments. PROTACs are often susceptible to the effects of efflux transporters, , necessitating careful consideration of assay choice to ensure physiological relevance. The third challenge was the lack of consideration of properties beyond cell membrane permeability. PROTAC design requires balancing additional properties such as degradation activity toward POIs and solubility. Future studies should address these challenges to advance the development of a more comprehensive and practical PROTAC design platform.
4. Methods
4.1. Prediction Models for Cell Membrane Permeability
We evaluated four types of molecular features derived from the simplified molecular input line entry system (SMILES) and three prediction methods. The features included Mordred descriptors containing only 2D descriptors, Mordred descriptors containing 3D descriptors, bit-based Morgan fingerprints, , and count-based Morgan fingerprints. For the 3D Mordred descriptors, the average descriptor values calculated from 10 conformers generated using RDKit software were used. Morgan fingerprints with a radius of 2 were also calculated using the RDKit software. TabPFN, LightGBM, and AutoGluon were evaluated as prediction models. TabPFN and AutoGluon were used with default parameters, whereas the hyperparameters of LightGBM were optimized using Optuna software. The experimentally measured A-to-B permeability across Caco-2 cell membranes was converted to a log10 scale and used as the target variable. The training data set consisted of 43 compounds with available Caco-2 permeability data from PROTAC-DB 3.0, with compound 258 excluded because of different experimental conditions. For data entries reported as “<x (μcm/s),” the value was replaced with “x/2 (μcm/s)” (e.g., <2 (μcm/s) was replaced with 1 (μcm/s)). In Section , the prediction model was reconstructed using the training data, excluding Compound 2895. The result of the prediction model is shown in Figure S20. Model performance was evaluated using leave-one-out cross-validation, with R, R 2, and RMSE as the evaluation metrics.
4.2. Linker Design Method
We developed PROTAC-TS by implementing a linker generation function in ChemTSv2, which is an ML-based de novo molecule generator. ChemTSv2 comprises RNN and MCTS algorithms based on reinforcement learning. The RNN-based linker generator was trained on the SMILES representations of the linkers and generated new SMILES strings. MCTS explores the chemical space based on a predefined reward function to design molecules with desirable properties. The RNN-based linker generator was modified to treat an asterisk as an attachment point between the linker and ligands. For training, 2749 SMILES linkers were curated from PROTAC-DB 3.0 by removing the ionized structures from the original set of 2753 SMILES linkers. These linkers were randomized four times and added to the curated data set. After removing duplicates, 12,863 SMILES linkers were used as the training data. For the RNN-based linker generation described in Sections , 2.3.2, 2.3.3, and 2.3.4, the linkers of Compound 2895, NX-2127, MS4078, and those of dBET6 and dBET57 were excluded from the training data set, respectively. The reward function was defined as a left-sided Gaussian function with parameters μ = 0.25 and σ = 1, based on predicted cell membrane permeability values, as shown in Figure S21.
PROTAC-TS supports the application of filters for generating linkers. In this study, linkers were designed under three filtering conditions (relaxed, intermediate, and strict), as shown in Table S3. The relaxed condition consisted of attachment-point, linker-validation, and radical-atom filters. The attachment-point filter excluded the SMILES linker in which the number of asterisks used to indicate the points of connection to a POI ligand and an E3 ligand was not equal to two. The linker-validation filter excluded linkers that resulted in invalid molecules when attached to both ligands. The radical-atom filter excluded linkers that contained radical electrons. The intermediate condition consisted of filters applied to each linker and the corresponding PROTAC generated by attaching the linker to both ligands. The filters applied to the linkers included attachment-point, linker-validation, radical-atom, ring-structure, and branched-structure filters. The ring-structure filter excluded linkers containing ring substructures with fewer than five or more than six members. The branched-structure filter excludes linkers containing two or more consecutive atoms branching from the shortest path between two attachment points. Atoms within a ring structure were not considered branches if the ring itself was part of the shortest path. The filters applied to PROTACs included alert-substructure and specific-substructure filters. The alert-substructure filter excluded linkers that resulted in PROTACs containing substructures listed under “Common Alerts” in the medchem package. In Section , the alert-substructure filter was not applied. The specific substructure filter excluded linkers that resulted in PROTACs containing substructures that were considered synthetically challenging or chemically unstable, as specified in the SMILES arbitrary target specification format, as shown in Table S4. The strict condition also consisted of filters applied to each linker and the corresponding PROTAC generated by attaching the linker to both ligands. The filters applied to the linkers include attachment-point, linker-validation, radical-atom, ring-structure, branched-structure, linker-length, and linker-similarity filters. The linker-length filter excluded linkers whose shortest path length between attachment points exceeded 15. The linker-similarity filter excluded linkers whose maximum Tanimoto similarity to any of the 2748 linkers used to train the RNN-based linker generator was below 0.3. The filters applied to the PROTACs included alert-substructure, specific-substructure, and prediction-model-AD filters. The prediction-model-AD filter excluded linkers that resulted in PROTACs falling outside the AD of the prediction model. To define AD in this study, we adopted the maximum value of the Tanimoto similarity for the training data, which is a simple and commonly used method. ,− The similarity was calculated using Morgan fingerprints with radii and dimensions of 2 and 2,048, respectively. The threshold was set to 0.1. For each condition, 30,000 linkers were designed for each of three independent runs. To reproduce the linkers of Compound 2895 and KT-474, 30,000 linkers were designed in 10 and 100 independent runs, respectively.
4.3. Linker Clustering
Clustering analysis was performed to visualize the structural diversity of the designed linkers. The analysis was applied to PROTAC linkers designed under all filtering conditions for Sections , 2.2.2, and 2.3.1, and the strict conditions for Sections , 2.3.3, and 2.3.4. The top 5000 generated linkers, ranked by reward values, were clustered using the K-medoids algorithm with Morgan fingerprints as input features. The Morgan fingerprints were computed with a radius of 2 and a dimensionality of 2048. The number of clusters was set to 10.
4.4. Caco-2 Cell Membrane Permeability Study
KT-474 (Catalog No. E1655), NX-2127 (Catalog No. E1381), MS4078 (Catalog No. S0072), dBET6 (Catalog No. S8762), and dBET57 (Catalog No. S0137) were purchased from Selleck Chemicals (Houston, TX, USA). For this evaluation, the S-enantiomer of NX-2127, which possesses a stereogenic center at the pyrrolidine moiety, was used. The Caco-2 cell permeability assay using these PROTACs was outsourced to Axcelead Drug Discovery Partners, Inc. (Kanagawa, Japan). Caco-2 cells (American Type Culture Collection, Manassas, VA, USA) were cultured, and a transcellular transport study was performed. The cells were cultured in Transwell 96-well permeable support (pore size 0.4 μm, 0.11 cm2 surface area) with a polycarbonate membrane (PSHT004S5, Millipore Corporation, Bedford, MA, USA). The cells were preincubated with bovine serum albumin (BSA)-free Hank’s balanced salt solution (HBSS) in the apical compartment (75 μL) and HBSS containing 4% BSA in the basal compartment (250 μL) for 10 min at 37 °C. Subsequently, transcellular transport was initiated by the addition of BSA-free HBSS to apical compartments containing 10 μM compounds. The assay was terminated by separating each assay plate after 2 h. Aliquots (50 μL) from the basolateral compartment were mixed with acetonitrile/methanol. After centrifugation, the compound concentrations in the supernatant were measured using liquid chromatography-tandem mass spectrometry (LC–MS/MS) and a Kinetex C18 column (2.6 μm, 2.1 × 30 mm). The apparent permeabilities (P app, AtoB) of the receivers were determined. Lucifer yellow was added onto the apical compartment at a concentration of 100 μM. The P app of lucifer yellow in the wells where each test compound was evaluated was reported. The P app value of each compound in the membrane permeability test was calculated from a one-point standard curve (0.1 μM) using the following equation.
where C (μM) is the concentration of the test compound on the basal side after incubation (measured value); V (mL) is the volume of the basolateral compartment (0.25 mL); T (s) is the permeation experiment time (7200 s); A (cm2) is the membrane area (0.11 cm2); and C0 (μM) is the concentration of test compound (measured value).
Supplementary Material
Acknowledgments
This work was supported by the Japan Society for the Promotion of Science KAKENHI [grant number JP24KJ1877 to Y.M.]. K.T. is supported by JST FOREST Program [grant number: JPMJFR232U], JST BOOST Program [grant number JPMJBY24F0], JSPS KAKENHI [grant number JP24H00473], and “Development of a Next-generation Drug Discovery AI through Industry-Academia Collaboration (DAIIA)” [grant number JP23nk0101111], and MEXT Simulation- and AI-driven next-generation medicine and drug discovery based on “Fugaku” [grant number JPMXP1020230120].
The source codes and models are available via https://github.com/ycu-iil/PROTAC-TS.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jacsau.6c00033.
Figures S1 to S21 and Tables S1 to S4 (PDF)
Yuki Murakami: conceptualization (lead), funding acquisition (lead), methodology (lead), software (lead), writing – original draft (lead), writing – review and editing (lead). Shoichi Ishida: conceptualization (support), methodology (lead), software (lead), writing – review and editing (support). Nobuo Cho: methodology (support), writing – review and editing (support). Hitomi Yuki: methodology (support), writing – review and editing (support). Masateru Ohta: methodology (support), writing – review and editing (support). Teruki Honma: methodology (support), writing – review and editing (support). Yosuke Demizu: conceptualization (support), methodology (support), writing – review and editing (support). Kei Terayama: supervision (lead), conceptualization (lead), funding acquisition (lead), methodology (lead), project administration (lead), software (support), writing – review and editing (lead). All authors approved the final version of the manuscript.
The authors declare no competing financial interest.
References
- Tsai J. M., Nowak R. P., Ebert B. L., Fischer E. S.. Targeted Protein Degradation: From Mechanisms to Clinic. Nat. Rev. Mol. Cell Biol. 2024;25(9):740–757. doi: 10.1038/s41580-024-00729-9. [DOI] [PubMed] [Google Scholar]
- Dale B., Cheng M., Park K.-S., Kaniskan H. U. ¨., Xiong Y., Jin J.. Advancing Targeted Protein Degradation for Cancer Therapy. Nat. Rev. Cancer. 2021;21(10):638–654. doi: 10.1038/s41568-021-00365-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chirnomas D., Hornberger K. R., Crews C. M.. Protein Degraders Enter the Clinica New Approach to Cancer Therapy. Nat. Rev. Clin. Oncol. 2023;20(4):265–278. doi: 10.1038/s41571-023-00736-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciulli A., Trainor N.. A Beginner’s Guide to PROTACs and Targeted Protein Degradation. Biochemist. 2021;43(5):74–79. doi: 10.1042/bio_2021_148. [DOI] [Google Scholar]
- Snyder L. B., Neklesa T. K., Willard R. R., Gordon D. A., Pizzano J., Vitale N., Robling K., Dorso M. A., Moghrabi W., Landrette S., Gedrich R., Lee S. H., Taylor I. C. A., Houston J. G.. Preclinical Evaluation of Bavdegalutamide (ARV-110), a Novel Proteolysis TArgeting Chimera Androgen Receptor Degrader. Mol. Cancer Ther. 2025;24(4):511–522. doi: 10.1158/1535-7163.MCT-23-0655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Z., Zhou J.. NDA Submission of Vepdegestrant (ARV-471) to U.S. FDA: The Beginning of a New Era of PROTAC Degraders. J. Med. Chem. 2025;68(14):14129–14136. doi: 10.1021/acs.jmedchem.5c01818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Békés M., Langley D. R., Crews C. M.. PROTAC Targeted Protein Degraders: The Past Is Prologue. Nat. Rev. Drug Discov. 2022;21(3):181–200. doi: 10.1038/s41573-021-00371-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Apprato G., Ermondi G., Caron G.. The Quest for Oral PROTAC Drugs: Evaluating the Weaknesses of the Screening Pipeline. ACS Med. Chem. Lett. 2023;14(7):879–883. doi: 10.1021/acsmedchemlett.3c00231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ermondi G., Garcia Jimenez D., Rossi Sebastiano M., Caron G.. Rational Control of Molecular Properties Is Mandatory to Exploit the Potential of PROTACs as Oral Drugs. ACS Med. Chem. Lett. 2021;12(7):1056–1060. doi: 10.1021/acsmedchemlett.1c00298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Apprato G., Poongavanam V., Garcia Jimenez D., Atilaw Y., Erdelyi M., Ermondi G., Caron G., Kihlberg J.. Exploring the Chemical Space of Orally Bioavailable PROTACs. Drug Discov Today. 2024;29(4):103917. doi: 10.1016/j.drudis.2024.103917. [DOI] [PubMed] [Google Scholar]
- Poongavanam V., Kihlberg J.. PROTAC Cell Permeability and Oral Bioavailability: A Journey Into Uncharted Territory. Future Med. Chem. 2022;14(3):123–126. doi: 10.4155/fmc-2021-0208. [DOI] [PubMed] [Google Scholar]
- Maple H. J., Clayden N., Baron A., Stacey C., Felix R.. Developing Degraders: Principles and Perspectives on Design and Chemical Space. Medchemcomm. 2019;10(10):1755–1764. doi: 10.1039/C9MD00272C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edmondson S. D., Yang B., Fallan C.. Proteolysis Targeting Chimeras (PROTACs) in ‘beyond Rule-of-Five’ Chemical Space: Recent Progress and Future Challenges. Bioorg. Med. Chem. Lett. 2019;29(13):1555–1564. doi: 10.1016/j.bmcl.2019.04.030. [DOI] [PubMed] [Google Scholar]
- Yokoo H., Osawa H., Saito K., Demizu Y.. Correlation between Membrane Permeability and the Intracellular Degradation Activity of Proteolysis-Targeting Chimeras. Chem. Pharm. Bull. (Tokyo) 2024;72(11):961–965. doi: 10.1248/cpb.c24-00615. [DOI] [PubMed] [Google Scholar]
- Zagidullin A., Milyukov V., Rizvanov A., Bulatov E.. Novel Approaches for the Rational Design of PROTAC Linkers. Explor Target Antitumor Ther. 2020;1(5):381–390. doi: 10.37349/etat.2020.00023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein V. G., Bond A. G., Craigon C., Lokey R. S., Ciulli A.. Amide-to-Ester Substitution as a Strategy for Optimizing PROTAC Permeability and Cellular Activity. J. Med. Chem. 2021;64(24):18082–18101. doi: 10.1021/acs.jmedchem.1c01496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yokoo H., Shibata N., Endo A., Ito T., Yanase Y., Murakami Y., Fujii K., Hamamura K., Saeki Y., Naito M., Aritake K., Demizu Y.. Discovery of a Highly Potent and Selective Degrader Targeting Hematopoietic Prostaglandin D Synthase via in Silico Design. J. Med. Chem. 2021;64(21):15868–15882. doi: 10.1021/acs.jmedchem.1c01206. [DOI] [PubMed] [Google Scholar]
- Wurz R. P., Dellamaggiore K., Dou H., Javier N., Lo M. C., McCarter J. D., Mohl D., Sastri C., Lipford J. R., Cee V. J. A.. “Click Chemistry Platform” for the Rapid Synthesis of Bispecific Molecules for Inducing Protein Degradation. J. Med. Chem. 2018;61(2):453–461. doi: 10.1021/acs.jmedchem.6b01781. [DOI] [PubMed] [Google Scholar]
- Han X., Wang C., Qin C., Xiang W., Fernandez-Salas E., Yang C. Y., Wang M., Zhao L., Xu T., Chinnaswamy K., Delproposto J., Stuckey J., Wang S.. Discovery of ARD-69 as a Highly Potent Proteolysis Targeting Chimera (PROTAC) Degrader of Androgen Receptor (AR) for the Treatment of Prostate Cancer. J. Med. Chem. 2019;62(2):941–964. doi: 10.1021/acs.jmedchem.8b01631. [DOI] [PubMed] [Google Scholar]
- Cyrus K., Wehenkel M., Choi E.-Y., Han H.-J., Lee H., Swanson H., Kim K.-B.. Impact of Linker Length on the Activity of PROTACs. Mol. BioSyst. 2010;7(2):359–364. doi: 10.1039/C0MB00074D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poongavanam V., Atilaw Y., Siegel S., Giese A., Lehmann L., Meibom D., Erdelyi M., Kihlberg J.. Linker-Dependent Folding Rationalizes PROTAC Cell Permeability. J. Med. Chem. 2022;65(19):13029–13040. doi: 10.1021/acs.jmedchem.2c00877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foley C. A., Potjewyd F., Lamb K. N., James L. I., Frye S. V.. Assessing the Cell Permeability of Bivalent Chemical Degraders Using the Chloroalkane Penetration Assay. ACS Chem. Biol. 2020;15(1):290–295. doi: 10.1021/acschembio.9b00972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colletti L. M., Liu Y., Koev G., Richardson P. L., Chen C. M., Kati W.. Methods to Measure the Intracellular Concentration of Unlabeled Compounds within Cultured Cells Using Liquid Chromatography/Tandem Mass Spectrometry. Anal. Biochem. 2008;383(2):186–193. doi: 10.1016/j.ab.2008.08.012. [DOI] [PubMed] [Google Scholar]
- Nguyen T.-T.-L., Kim J. W., Choi H.-I., Maeng H.-J., Koo T.-S.. Development of an LC-MS/MS Method for ARV-110, a PROTAC Molecule, and Applications to Pharmacokinetic Studies. Molecules. 2022;27(6):1977. doi: 10.3390/molecules27061977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abeje Y. E., Wieske L. H. E., Poongavanam V., Maassen S., Atilaw Y., Cromm P., Lehmann L., Erdelyi M., Meibom D., Kihlberg J.. Impact of Linker Composition on VHL PROTAC Cell Permeability. J. Med. Chem. 2025;68(1):638–657. doi: 10.1021/acs.jmedchem.4c02492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bemis T. A., La Clair J. J., Burkart M. D.. Unraveling the Role of Linker Design in Proteolysis Targeting Chimeras. J. Med. Chem. 2021;64(12):8042–8052. doi: 10.1021/acs.jmedchem.1c00482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dong Y., Ma T., Xu T., Feng Z., Li Y., Song L., Yao X., Ashby C. R., Hao G.-F.. Characteristic Roadmap of Linker Governs the Rational Design of PROTACs. Acta Pharm. Sin. B. 2024;14(10):4266–4295. doi: 10.1016/j.apsb.2024.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neeser R. M., Akdel M., Kovtun D., Naef L.. Reinforcement Learning-Driven Linker Design via Fast Attention-Based Point Cloud Alignment. arXiv. 2023:arXiv:2306.08166. [Google Scholar]
- Tan Y., Dai L., Huang W., Guo Y., Zheng S., Lei J., Chen H., Yang Y.. DRlinker: Deep Reinforcement Learning for Optimization in Fragment Linking Design. J. Chem. Inf. Model. 2022;62(23):5907–5917. doi: 10.1021/acs.jcim.2c00982. [DOI] [PubMed] [Google Scholar]
- Li B., Ran T., Chen H.. 3D Based Generative PROTAC Linker Design with Reinforcement Learning. Brief Bioinform. 2023;24(5):bbad323. doi: 10.1093/bib/bbad323. [DOI] [PubMed] [Google Scholar]
- Zheng S., Tan Y., Wang Z., Li C., Zhang Z., Sang X., Chen H., Yang Y.. Accelerated Rational PROTAC Design via Deep Learning and Molecular Simulations. Nat. Mach Intell. 2022;4(9):739–748. doi: 10.1038/s42256-022-00527-y. [DOI] [Google Scholar]
- Kao C. T., Lin C. T., Chou C. L., Lin C. C.. Fragment Linker Prediction Using the Deep Encoder-Decoder Network for PROTACs Drug Design. J. Chem. Inf. Model. 2023;63(10):2918–2927. doi: 10.1021/acs.jcim.2c01287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F., Hu Q., Zhou Y., Yang H., Bai F.. DiffPROTACs Is a Deep Learning-Based Generator for Proteolysis Targeting Chimeras. Brief Bioinform. 2024;25(5):bbae358. doi: 10.1093/bib/bbae358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo J., Knuth F., Margreitter C., Janet J. P., Papadopoulos K., Engkvist O., Patronov A.. Link-INVENT: Generative Linker Design with Reinforcement Learning. Digital Discovery. 2023;2(2):392–408. doi: 10.1039/D2DD00115B. [DOI] [Google Scholar]
- Poongavanam V., Kölling F., Giese A., Göller A. H., Lehmann L., Meibom D., Kihlberg J.. Predictive Modeling of PROTAC Cell Permeability with Machine Learning. ACS Omega. 2023;8(6):5901–5916. doi: 10.1021/acsomega.2c07717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peteani G., Huynh M. T. D., Gerebtzoff G., Rodríguez-Pérez R.. Application of Machine Learning Models for Property Prediction to Targeted Protein Degraders. Nat. Commun. 2024;15(1):5764. doi: 10.1038/s41467-024-49979-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ge J., Li S., Weng G., Wang H., Fang M., Sun H., Deng Y., Hsieh C.-Y., Li D., Hou T.. PROTAC-DB 3.0: An Updated Database of PROTACs with Extended Pharmacokinetic Parameters. Nucleic Acids Res. 2025;53(D1):D1510–D1515. doi: 10.1093/nar/gkae768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishida S., Aasawat T., Sumita M., Katouda M., Yoshizawa T., Yoshizoe K., Tsuda K., Terayama K.. ChemTSv2: Functional Molecular Design Using de Novo Molecule Generator. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2023;13(6):e1680. doi: 10.1002/wcms.1680. [DOI] [Google Scholar]
- Langevin M., Grebner C., Güssregen S., Sauer S., Li Y., Matter H., Bianciotto M.. Impact of Applicability Domains to Generative Artificial Intelligence. ACS Omega. 2023;8(25):23148–23167. doi: 10.1021/acsomega.3c00883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaneko H., Funatsu K.. Applicability Domains and Consistent Structure Generation. Mol. Inform. 2017;36(1–2):1600032. doi: 10.1002/minf.201600032. [DOI] [PubMed] [Google Scholar]
- Yoshizawa T., Ishida S., Sato T., Ohta M., Honma T., Terayama K.. A Data-Driven Generative Strategy to Avoid Reward Hacking in Multi-Objective Molecular Design. Nat. Commun. 2025;16(1):2409. doi: 10.1038/s41467-025-57582-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng X., Ji N., Campbell V., Slavin A., Zhu X., Chen D., Rong H., Enerson B., Mayo M., Sharma K., Browne C. M., Klaus C. R., Li H., Massa G., McDonald A. A., Shi Y., Sintchak M., Skouras S., Walther D. M., Yuan K., Zhang Y., Kelleher J., Liu G., Luo X., Mainolfi N., Weiss M. M.. Discovery of KT-474–a Potent, Selective, and Orally Bioavailable IRAK4 Degrader for the Treatment of Autoimmune Diseases. J. Med. Chem. 2024;67(20):18022–18037. doi: 10.1021/acs.jmedchem.4c01305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollmann N., Müller S., Purucker L., Krishnakumar A., Körfer M., HooSchirrmeister S. R. T., Hutter F.. Accurate Predictions on Small Data with a Tabular Foundation Model. Nature. 2025;637(8045):319–326. doi: 10.1038/s41586-024-08328-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y.. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017;30:3146–3154. [Google Scholar]
- Erickson N., Mueller J., Shirkov A., Zhang H., Larroy P., Li M., Smola A.. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv. 2020:arXiv:2003.06505. [Google Scholar]
- Filippakopoulos P., Qi J., Picaud S., Shen Y., Smith W. B., Fedorov O., Morse E. M., Keates T., Hickman T. T., Felletar I., Philpott M., Munro S., McKeown M. R., Wang Y., Christie A. L., West N., Cameron M. J., Schwartz B., Heightman T. D., La Thangue N., French C. A., Wiest O., Kung A. L., Knapp S., Bradner J. E.. Selective Inhibition of BET Bromodomains. Nature. 2010;468(7327):1067–1073. doi: 10.1038/nature09504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galdeano C., Gadd M. S., Soares P., Scaffidi S., Van Molle I., Birced I., Hewitt S., Dias D. M., Ciulli A.. Structure-Guided Design and Optimization of Small Molecules Targeting the Protein–Protein Interaction between the von Hippel–Lindau (VHL) E3 Ubiquitin Ligase and the Hypoxia Inducible Factor (HIF) Alpha Subunit with in Vitro Nanomolar Affinities. J. Med. Chem. 2014;57(20):8657–8663. doi: 10.1021/jm5011258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Min J., Mayasundari A., Keramatnia F., Jonchere B., Yang S. W., Jarusiewicz J., Actis M., Das S., Young B., Slavish J., Yang L., Li Y., Fu X., Garrett S. H., Yun M., Li Z., Nithianantham S., Chai S., Chen T., Shelat A., Lee R. E., Nishiguchi G., White S. W., Roussel M. F., Potts P. R., Fischer M., Rankovic Z.. Phenyl-Glutarimides: Alternative Cereblon Binders for the Design of PROTACs. Angew. Chem. Int. Ed. 2021;60(51):26663–26670. doi: 10.1002/anie.202108848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robbins D. W., Noviski M. A., Tan Y. S., Konst Z. A., Kelly A., Auger P., Brathaban N., Cass R., Chan M. L., Cherala G., Clifton M. C., Gajewski S., Ingallinera T. G., Karr D., Kato D., Ma J., McKinnell J., McIntosh J., Mihalic J., Murphy B., Panga J. R., Peng G., Powers J., Perez L., Rountree R., Tenn-McClellan A., Sands A. T., Weiss D. R., Wu J., Ye J., Guiducci C., Hansen G., Cohen F.. Discovery and Preclinical Pharmacology of NX-2127, an Orally Bioavailable Degrader of Bruton’s Tyrosine Kinase with Immunomodulatory Activity for the Treatment of Patients with B Cell Malignancies. J. Med. Chem. 2024;67(4):2321–2336. doi: 10.1021/acs.jmedchem.3c01007. [DOI] [PubMed] [Google Scholar]
- Zhang C., Han X. R., Yang X., Jiang B., Liu J., Xiong Y., Jin J.. Proteolysis Targeting Chimeras (PROTACs) of Anaplastic Lymphoma Kinase (ALK) Eur. J. Med. Chem. 2018;151:304–314. doi: 10.1016/j.ejmech.2018.03.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winter G. E., Mayer A., Buckley D. L., Erb M. A., Roderick J. E., Vittori S., Reyes J. M., di Iulio J., Souza A., Ott C. J., Roberts J. M., Zeid R., Scott T. G., Paulk J., Lachance K., Olson C. M., Dastjerdi S., Bauer S., Lin C. Y., Gray N. S., Kelliher M. A., Churchman L. S., Bradner J. E.. BET Bromodomain Proteins Function as Master Transcription Elongation Factors Independent of CDK9 Recruitment. Mol. Cell. 2017;67(1):5–18e19. doi: 10.1016/j.molcel.2017.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nowak R. P., DeAngelo S. L., Buckley D., He Z., Donovan K. A., An J., Safaee N., Jedrychowski M. P., Ponthier C. M., Ishoey M., Zhang T., Mancias J. D., Gray N. S., Bradner J. E., Fischer E. S.. Plasticity in Binding Confers Selectivity in Ligand-Induced Protein Degradation. Nat. Chem. Biol. 2018;14(7):706–714. doi: 10.1038/s41589-018-0055-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atilaw Y., Poongavanam V., Svensson Nilsson C., Nguyen D., Giese A., Meibom D., Erdelyi M., Kihlberg J.. Solution Conformations Shed Light on PROTAC Cell Permeability. ACS Med. Chem. Lett. 2021;12(1):107–114. doi: 10.1021/acsmedchemlett.0c00556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolf G., Craigon C., Teoh S. T., Essletzbichler P., Onstein S., Cassidy D., Uijttewaal E. C. H., Dvorak V., Cao Y., Bensimon A., Elling U., Ciulli A., Superti-Furga G.. The Efflux Pump ABCC1/MRP1 Constitutively Restricts PROTAC Sensitivity in Cancer Cells. Cell Chem. Biol. 2025;32(2):291–306e6. doi: 10.1016/j.chembiol.2024.11.009. [DOI] [PubMed] [Google Scholar]
- Kurimchak A. M., Herrera-Montávez C., Montserrat-Sangrà S., Araiza-Olivera D., Hu J., Neumann-Domer R., Kuruvilla M., Bellacosa A., Testa J. R., Jin J., Duncan J. S.. The Drug Efflux Pump MDR1 Promotes Intrinsic and Acquired Resistance to PROTACs in Cancer Cells. Sci. Signal. 2022;15(749):eabn2707. doi: 10.1126/scisignal.abn2707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weininger D.. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
- Moriwaki H., Tian Y.-S., Kawashita N., Takagi T.. Mordred A Molecular Descriptor Calculator. J. Cheminf. 2018;10(1):4. doi: 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers D., Hahn M.. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
- Morgan H. L.. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965;5(2):107–113. doi: 10.1021/c160017a018. [DOI] [Google Scholar]
- Landrum, G. RDKit: Open-Source Cheminformatics. https://www.rdkit.org. (accessed 2026-01-07)
- Akiba, T. ; Sano, S. ; Yanase, T. ; Ohta, T. ; Koyama, M. . Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; ACM: New York, 2019, pp 2623–2631. [Google Scholar]
- Emmanuel, N. ; Hadrien, M. ; Kyle, M. K. ; Shawn, W. ; Julien, S. ; Honoré, H. ; Michael, C. . Datamol-Io/Medchem: Molecular Filtering for Drug Discovery. https://zenodo.org/records/14588938. (accessed 2026-01-07)
- Perron Q., Mirguet O., Tajmouati H., Skiredj A., Rojas A., Gohier A., Ducrot P., Bourguignon M., Sansilvestri-Morel P., Do Huu N., Gellibert F., Gaston-Mathé Y.. Deep Generative Models for Ligand-based de Novo Design Applied to Multi-parametric Optimization. J. Comput. Chem. 2022;43(10):692–703. doi: 10.1002/jcc.26826. [DOI] [PubMed] [Google Scholar]
- Tetko I. V., Sushko I., Pandey A. K., Zhu H., Tropsha A., Papa E., Öberg T., Todeschini R., Fourches D., Varnek A.. Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena Pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection. J. Chem. Inf. Model. 2008;48(9):1733–1746. doi: 10.1021/ci800151m. [DOI] [PubMed] [Google Scholar]
- Kar S., Roy K., Leszczynski J.. Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling. Methods Mol. Biol. 2018;1800:141–169. doi: 10.1007/978-1-4939-7899-1_6. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source codes and models are available via https://github.com/ycu-iil/PROTAC-TS.





