MolOrgGPT: De Novo Generation via Large Language Models and Reinforcement Learning

Pablo Varas Pardo; Oscar Toledano; Guillermo Marcos-Ayuso; David Quesada; Nuria E Campillo

doi:10.1021/acs.jcim.5c02400

. 2026 Jan 7;66(2):910–922. doi: 10.1021/acs.jcim.5c02400

MolOrgGPT: De Novo Generation via Large Language Models and Reinforcement Learning

Pablo Varas Pardo ^†,^‡,^§, Oscar Toledano ^∥, Guillermo Marcos-Ayuso ^†, David Quesada ^†, Nuria E Campillo ^∥,^*

PMCID: PMC12848977 PMID: 41499480

Abstract

We present a general framework for the de novo design of small molecules with desirable chemical properties, developed to aid the creation of novel chemical entities with potential therapeutic use. The system is built upon a foundational Large Language Model trained on a large comprehensive chemical database capable of generating structurally diverse and synthetically accessible compounds. It is then fine-tuned through reinforcement learning to enhance its capacity to generate molecules tailored to specific biological targets. As a case study, we apply this framework to design molecules targeting key proteins involved in Alzheimer’s disease. The generated compounds underwent molecular docking studies to assess their binding affinities and prioritize candidates with optimal predicted interactions. The top-ranked molecules were further analyzed based on their binding modes and key molecular interactions with the target proteins. The results suggest that our generative model produces viable, drug-like molecules with favorable interactions, underscoring its potential as a valuable tool in early stage drug discovery.

graphic file with name ci5c02400_0011.jpg

graphic file with name ci5c02400_0010.jpg

Introduction

Recent advances in artificial intelligence (AI) have radically changed drug discovery. − Given the immensity of the chemical space, , AI tools and methods have become essential for exploring such space, generating novel molecules that interact with relevant protein pockets, and identifying patterns in molecular data to optimize the desired pharmacological properties, to name but a few uses.

Three major approaches exist to address the challenge of designing new molecules that effectively bind to specific protein pockets. The first one, the ligand-based approach, starts with a set of active molecules from which the main features (substructures responsible for activity) are extracted. Once obtained, these features are combined to create new molecules. − The effectiveness of this approach has been extensively proven experimentally. − However, its major problem is that a database of active molecules for a given protein is not always available; this inherently limits the exploration of chemical space. The second approach is structure-based. , One starts with the protein pocket of interest, which is input into the model that then designs new molecules based on such protein. The main issue with this approach is its tendency to generate unrealistic molecules, , typically neglecting crucial aspects, such as their nontoxicity or drug-likeness. As a consequence, while the generated molecules may initially appear promising, most of them ultimately fail to meet the criteria necessary to qualify as viable drug candidates. Although this approach has gained popularity in recent years and already provided successful cases, it still needs to be validated through biochemical assays. Finally, the third approach treats the problem as an optimization task, where, using reinforcement learning (RL), an agent learns to produce molecules that bind with high affinity to a protein pocket. With the availability of docking tools, − which simulate the binding affinity between a molecule and a given protein, the model can learn to produce high-affinity molecules. − This approach can overcome the inherent limitations of the other two methods and address additional challenges, such as optimizing around a predefined scaffold or designing multitarget compounds capable of effectively interacting with multiple protein pockets. −

Large Language Models (LLMs), built upon the Transformer architecture, have shown exceptional capabilities in capturing complex dependencies within sequences in multiple domains through their built-in attention mechanism. In particular, in our domain of molecular generation, this makes them highly suitable for both predicting molecular properties and generating novel molecules, as , reflect. RL, especially the Proximal Policy Optimization (PPO) algorithm, has become essential in the fine-tuning of LLMs. In drug discovery, RL can guide the model to produce molecules with optimal properties of interest, significantly enhancing the chances of developing effective drug candidates. ,

This paper introduces MolOrgGPT, a family of three foundational models for molecular generation. We develop three variants of a Generative Pretrained Transformer (GPT-2) trained on 800 M molecules from the ZINC20 database, one of the most comprehensive data sets available for molecular LLM training. To our knowledge, this represents the largest chemical data set and among the largest generative models trained so far for de novo molecular design. Furthermore, using RL, we develop ADMolOrgGPT models specifically designed to generate novel molecules targeting protein pockets associated with Alzheimer’s Disease (AD). This approach demonstrates the potential of GPT models to capture patterns in chemical language that enable the optimization of docking scores against a protein pocket, something that has rarely been explored due to the high computational cost of docking and the challenges of integrating it within generative pipelines.

Our strategy consists of two main stages:

1.
Pretrain decoder-only transformer models. In this stage, we developed MolOrgGPT-M, L, and XL, based on the GPT-2 architecture, using a huge subset from the ZINC20 database. Its molecules were available in the popular Simplified Molecular Input Line Entry System (SMILES) format, but as Section Less Successful Attempts explains, we finally adopted a variation of this representation called Self-Referencing Embedded Strings (SELFIES), which ensures the chemical validity of all generated molecules through a set of predefined rules preventing valence violation, with the following desirable outcomes: increasing the chemical diversity of generated molecules, providing richer latent representations for molecular property prediction, and delivering foundational models for drug discovery task-specific fine-tuning.
2.
Reinforcement Learning for guided molecular generation. Building upon our pretrained MolOrgGPT models, we employed a PPO-RL algorithm to guide molecular generation. Given our interest in neurodegenerative diseases, we focused on generating molecules targeting three protein pockets known to be associated with AD: Dual-Specificity Tyrosine-phosphorylation-Regulated Kinase 1A (DYRK1A), β-Secretase 1 (BACE-1), and Butyrylcholinesterase (BuChE).

We compare the performance of our approach against two baseline models: Pocket2Mol and DrugGPT, which respectively use a pocket-based and a ligand-based condition approach. Our results suggest that ADMolOrgGPT stands out in generating molecules with desired properties. Additionally, our model excels in scaffold-constrained generation, learning to produce high-affinity molecules for a protein pocket based on a predefined scaffold. Furthermore, the flexibility of the PPO reward function allows us to generate molecules simultaneously optimized for multiple targets, an innovative capability in the context of pocket-based drug design. To further assess and confirm the binding capabilities of the designed compounds, molecular docking simulations were undertaken with the highest-scoring molecules produced by the generative model.

Methods

MolOrgGPT: Foundational LLMs for Molecular Generation

Transformers constitute a well-known deep neural network architecture with three main variants: encoder-only, decoder-only, and encoder–decoder models. In the field of molecular generation, many existing approaches have utilized an encoder to extract features from a protein pocket, followed by a decoder that generates candidate molecules. ,, However, these methods frequently encounter challenges due to the limited availability of protein–ligand affinity data and the difficulty of integrating RL to fine-tune relevant properties of the generated molecules. Examples of encoder-decoder models include TamGen and Lingo3dMol.

Encoder-only models include Bidirectional Encoder Representations from Transformers (BERT), the Robustly Optimized BERT Pretraining Approach (RoBERTa), and chemical domain-specific models available at Hugging Face, including Roberta-Zinc and ChemBERTa. These models are designed to process input sequences bidirectionally, making them effective for tasks such as classification and representation learning of molecules.

Conversely, decoder-only models, such as MolGPT or DrugGPT, have shown promising results in molecular generation tasks. Crucially, the decoder-only architecture itself can be integrated into RL workflow for optimization, motivating its choice in our design.

Model Architecture

We employ a decoder-only Transformer architecture based on Karpathy’s nanoGPT codebase. Each input symbol, a SELFIES token, is mapped into an embedding vector and combined with positional encoding to retain token order information. These representations flow through a stack of Transformer blocks. Each block contains a multihead self-attention layer that allows every token to attend to all preceding tokens, followed by a position-wise feed-forward network that captures complex feature interactions. Finally, a linear layer converts the last hidden state at each position into a probability distribution over the vocabulary to predict the next token. To keep attention focused, we include two complementary masks:

1.
Look-ahead mask, preventing the model from attending to future tokens.
2.
Padding mask, facilitating ignoring tokens added to equalize sequence length, represented with the [PAD] token, ensuring they do not influence the chemical tokens.

MolOrgGPT-M, L, and XL share the overall architecture described above, but as seen in Table , differ in the number of transformer blocks (Layers), the dimensionality of the initial embedding layer (d _model) and the number of attention heads (Heads).

1. Architecture Hyperparameters for MolOrgGPT Variants.

model	parameters	layers	d _model	heads
MolOrgGPT-M	302M	24	1024	16
MolOrgGPT-L	702M	36	1280	20
MolOrgGPT-XL	1474M	48	1600	25

Open in a new tab

For computational convenience, the output layer dimension was set to 512 (a power of two, which aligns with common GPU memory-alignment and parallelization optimizations). Further details regarding these design choices are available in Section Training process.

Data Set

We utilized a subset of the ZINC20 database, containing approximately 800 M molecules in SMILES format with molecular weights ranging from 250 to 500 Da and a partition coefficient (LogP) between 1 and 4.5. This selection aimed to train a molecular generation model that adheres to the well-known Lipinski’s rule of five, to ensure that the resulting molecules possess physicochemical properties suitable for drug discovery.

Training Process

The first step in training LLMs is typically a tokenization process. Choosing an appropriate tokenizer is critical, since LLM outputs vary considerably depending on this choice. We considered three tokenizers:

Character-level SMILES tokenizer, splitting a SMILES string into individual characters (resembling Markov’s approach when he introduced Markov chains).
Byte-pair-encoding (BPE) SMILES, which merge common character sequences into grouped tokens. This is widely used in natural language processing.
Bracket-level SELFIES tokenizer. This representation uses SELFIES, a robust alternative to SMILES, splitting a SELFIES string at each closing bracket so that every token corresponds to a complete bracketed SELFIES symbol.

After training the LLMs using these different tokenizers, we concluded that the bracket-level SELFIES one was the most appropriate choice for our purposes: a key advantage of SELFIES was that it consistently generates syntactically valid molecules, something of major relevance for our subsequent RL stage. Further details on our experiments with the other two tokenizers can be found in the section Less Successful Attempts.

To tokenize the molecules in our data set, we added three special symbols: [PAD], [SOS], and [EOS] (see Table ), denoting padding, start of sequence, and end of sequence, respectively.

2. Vocabulary Composition.

category	example	purpose
Special	[SOS]	Sequence control
Branch (ramifications)	[Branch1]	Open branch X positions back
Ring (ring closures)	[Ring2]	Close ring X positions back
Pure atoms	[=C]	Add atom with bond order
Stereochemistry (chiral centers)	[C@@H1]	Chiral center orientation
Directed bond (“/” or “\”)	[/C]	Bond direction
Formal charge/explicit H	[NH1]	Charge or explicit H

Open in a new tab

Figure illustrates the tokenization pipeline for a single molecule: the SMILES string is converted to SELFIES and then tokenized into model input tokens. Once the 800 M molecules were converted into SELFIES and tokenized, we filtered the data set to retain only those molecules containing at most 82 tokens, including [SOS] and [EOS] tokens, since such threshold allowed us to cover 97.5% of the entire data set. Molecules shorter than 82 were padded with the [PAD] token.

We then trained three separate models (MolOrgGPT-M, MolOrgGPT-L, and MolOrgGPT-XL) using an Adam optimizer with a learning-rate scheduler. For computational reasons, the models were trained using stochastic sampling over a subset of the tokenized data instead of a full pass through the entire corpus, with early stopping when the training loss converged. Given that each effective batch per iteration contains 519168 tokens and the training allowed up to 10000 iterations, the maximum number of distinct molecules that could be sampled during pretraining was approximately 65M.

ADMolOrgGPTs: Guiding Molecular Generation Targeting Alzheimer’s Disease through RL

Once the foundational MolOrgGPT models were available, we applied RL to improve their generation capabilities so as to deliver molecules with good properties as desired, in our case, generating molecules that are more likely to dock with AD recognized protein pockets. The effectiveness of this strategy will be demonstrated in Results.

RL Algorithm

We used PPO to implement RL. It is a widely adopted method that aims to maximize expected reward while ensuring that policy updates do not lead to drastic changes, which can destabilize training.

Reward Modeling

Reward modeling is a key step when applying RL. In our case, to guide the generation process we used normalized reward functions based on the logistic one, mapping input values to the interval [0,1]. This guarantees numerical stability as well as provides smooth gradients for PPO. By rescaling the input range of the logistic function, we control the steepness of the transition from low to high reward values, effectively tuning how sharply the model differentiates between poor and good docking scores.

Single-Pocket Reward

For the single-pocket formulation, used in Sections Single Pocket-Based Molecular Generation and Scaffold-Constrained Pocket-Based Generation, we linearly rescaled the docking empirical range [−12, – 6] kcal/mol, provided by Vina, to [−6, 6]. This rescaling allows that the docking score x, once mapped into this interval, is passed through the logistic function (eq ), ensuring a clearer separation between weak and strong binders,

s_{dock} (x) = \frac{1}{1 + e^{- x}}

finally transformed into the reward (eq ):

R_{dock} (x) = 1 - s_{dock} (x)

so that higher R _dock means better (more negative) docking.

Two-Pocket Reward

The two-pocket reward variant, employed in Section Two Pocket-Based Molecular Generation, linearly aggregates the docking scores x ₁ and x ₂ of the two protein pockets following eq :

R_{dock}^{(2)} (x_{1}, x_{2}) = α [1 - s_{dock} (x_{1})] + β [1 - s_{dock} (x_{2})],, α + β = 1,, α, β \geq 0

As is standard when aggregating objectives in multiobjective RL, we reflect a similar preference between both pockets by setting α = β = 0.5 in Section Two Pocket-Based Molecular Generation.

Computational Cost

The large-scale training phase was carried out leveraging eight NVIDIA GPUs, each equipped with 40 GB of high-bandwidth memory for 10000 iterations. The total times for training the models are reflected in Table :

3. Time Required for Pretraining Models.

model	time (h)
M	40.85
L	71.32
XL	95.98

Open in a new tab

The RL stage requires repeated docking evaluations and forward passes through the model. Because the computational load depends on model size, RL was run using 1 GPU for models M and L, and 2 GPUs for the XL model. For all models, the time per RL epoch remained stable at approximately 27 min. In general, the XL model converged faster to an optimal solution, but as a trade-off, it required more computation time during pretraining.

Docking Score Benchmarking

After applying the RL workflow, the docking scores of the proposed molecules showing highest score values were obtained using the Schrödinger suite to test their validity and analyze the interactions established between protein and ligand. They were also compared with the docking scores of reference molecules which are known to effectively bind to DYRK1A, BuChE, and BACE-1.

To conduct the docking analysis and determine the most favorable binding poses at the target site, in a first stage, the SMILES codes generated by the LLM model were transformed into three-dimensional structures using the LigPrep utility embedded in the Schrödinger suite, which uses the Epik module to generate the feasible ionization states at physiological pH conditions. Energy minimization was performed to refine these molecular structures, employing the OPLS4 force field. On the other hand, receptor proteins were also prepared, using the Protein Preparation Workflow, which includes the assignment of the bond order, the assignment of the protonation state according to physiological conditions, hydrogen bond formation and a final structural optimization using OPLS4 force field. The grid for generating the docking poses of the proposed ligands was centered at the position of the ligand in the reference crystalline structure (5TOL, 6EIF, and 7Q1P protein structures for the BACE-1, DYRK1A, and BuChE proteins, respectively). Finally, relevant ligand–protein interactions of the resulting poses with highest docking scores were also analyzed.

Results

The following subsections present the results attained with the ADMolOrgGPT-XL model for various generation tasks to enhance clarity. As the largest model, it generally outperformed the other two models. A comprehensive comparison of performance according to model size is in Section Model Comparison According to Size. The best molecules generated with different strategies are reported in Tables S1–S6 of the Supporting Information.

Single Pocket-Based Molecular Generation

We evaluate the performance of the proposed models in generating molecules against the three proposed protein pockets. In addition, we compare their results with those of Pocket2Mol and DrugGPT, two widely used benchmark models with publicly available code. To ensure fair comparison, we adopt standardized assessment protocols.

Each model will be assessed according to nine complementary criteria relevant in drug development:

Vina Docking Score (kcal/mol). The binding affinity calculated with AutoDock Vina through the AutoGrow4 workflow.
Quantitative Estimate of Drug-Likeness (QED). A score ranging from 0 to 1, calculated using RDKit’s QED module.
LogP. The compound’s lipophilicity obtained with RDKit Descriptors MolLogP.
Synthetic-Accessibility Score (SAS). It measures how easily a molecule can be synthesized, with scores ranging from 1 (easy) to 10 (difficult). It is computed using the RDKit contrib script sascorer.
Molecular Diversity. Defined as $(1 - S̅)$ , where $S̅$ is the mean pairwise Tanimoto similarity between 2048-bit Extended Connectivity Fingerprints (ECFP4) of molecules.
Toxicity. Assessed using Chemprop, a message-passing neural network that predicts the probability of toxicity across 12 Tox21 end points. A molecule passes the toxicity filter if its predicted probability is below 0.5 for all 12 end points.
Validity, Uniqueness, and Novelty. Measured following the GuacaMol benchmark: validity is the fraction of syntactically valid SMILES; uniqueness is the proportion of nonduplicate valid molecules; novelty is the proportion of generated unique molecules absent from the reference database.

Broadly speaking the desired properties of the generated molecules should include: a high binding affinity, reflected through a low Vina docking score; high drug-likeness, indicated by a high QED value; a LogP within the optimal range of 0 to 5; a low SAS score, suggesting ease of synthesis; high molecular diversity; and nontoxicity across all 12 Tox21 end points.

Table presents the results for a single generation batch of 256 molecules per protein pocket for the three models under evaluation. The results suggest that our model outperforms the baselines on several key properties, while remaining comparable in the other ones. ADMolOrgGPT achieves slightly lower docking scores, while remains comparable to the two baselines. This is significant considering that both Pocket2Mol and DrugGPT were trained on the CrossDocked and jglaser/binding_affinity data sets respectively, which consist solely of protein–ligand complexes, whereas our model was trained only on ZINC20 and optimized via RL. In turn, our model surpasses both baselines in terms of nontoxicity, QED, and percentage of molecules with LogP between 0 and 5. Lastly, although all models show comparable diversity, ADMolOrgGPT and Pocket2Mol achieve the highest validity, uniqueness, and novelty in generation. In Section Analysis of Baseline Training Data Sets we discuss the inherent generative bias of the models due to the different data sets used for training and its implications in the docking score and the physicochemical properties of the generated molecules.

4. Model Evaluation across Molecular Properties .

	Pocket2Mol	DrugGPT	ADMolOrgGPT
Vina Score (kcal/mol, ↓)	–8.60 ± 1.33	–8.81 ± 1.21	–8.29 ± 1.22
QED (↑)	0.64 ± 0.16	0.59 ± 0.18	0.68 ± 0.16
LogP (%, 0–5, ↑)	86.72 ± 1.23	82.42 ± 1.37	92.97 ± 0.92
SAS (↓)	3.37 ± 0.95	3.26 ± 0.98	3.31 ± 1.02
Diversity (↑)	0.87 ± 0.05	0.86 ± 0.05	0.86 ± 0.04
Toxicity (%, ↓)	36.07 ± 1.73	29.04 ± 1.64	8.59 ± 1.01
Validity (%, ↑)	99.09 ± 0.34	100.00 ± 0.00	100.00 ± 0.00
Uniqueness (%, ↑)	98.96 ± 0.37	100.00 ± 0.00	100.00 ± 0.00
Novelty (%, ↑)	98.44 ± 0.45	99.61 ± 0.23	100.00 ± 0.00

Open in a new tab

↑ Higher is better. ↓ Lower is better.

Scaffold-Constrained Pocket-Based Generation

Scaffolds in drug development are essential molecular structures that serve as building blocks for designing new drugs. − Consequently, the ability to create new molecules starting from a scaffold as a foundation is highly important. As an example, we selected the scaffold in Figure . According to our internal group studies, molecules that contain this substructure have demonstrated favorable docking scores in Vina simulations for the DYRK1A protein pocket. This section assesses ADMolOrgGPT’s ability to optimize molecular generation based on a specific protein pocket while maintaining a given scaffold, which we designate as scaffold-constrained molecular generation.

Scaffold selected for constrained pocket-based generation

For this, we trained the model under scaffold constraints and evaluated docking improvements across epochs. This experimental setup allows us to directly assess how optimization proceeds under fixed structural cores. Results are presented in Figure , which shows the docking scores attained with our constrained two-stage process, under constrains. The blue boxplot reflects the initial scores, whereas the orange one corresponds to those obtained after optimization. Note that just after five training epochs, the model learned to generate molecules that maintain the scaffold while considerably improving the docking score. Compared to the molecules produced by ADMolOrgGPT for the DYRK1A protein in the experiment described in Section Single Pocket-Based Molecular Generation, the mean docking score improved from −8.58 to −9.43, suggesting an important performance improvement. This outcome highlights the effectiveness of our approach, where a predefined scaffold is identified and then optimized to generate molecules with better docking scores.

Scaffold-constrained docking optimization.

A key aspect of our methodology is the ability to track policy shifts in the LLM, allowing us to extract mechanistic insights from the RL process. As shown in eq , the Vina scoring function depends on Gaussian terms, hydrophobic contributions, repulsion interactions, hydrogen bonding, and ligand flexibility:

E_{Vina} = - 0.0356 Gauss1 - 0.0052 Gauss2 + 0.8402 Repulsion - 0.0351 Hydrophobic - 0.5874 Hbonding + 0.0585 N_{rot}

Figure shows how the probability of selected tokens evolves from epoch 0 to epoch 5 as the model improves the Vina docking score of the generated molecules. The DYRK1A pocket is predominantly hydrophobic and aromatic, with few relevant H-bond donors. Consequently, the loss of H-bond sites ([N] ↓, [O] ↓) has limited impact, whereas increases in rigidity and aromaticity ([Ring1] ↑, [=C] ↑, [=N] ↑) consistently enhance packing and Gaussian terms. This drives the RL process toward more rigid, compact, and aromatic structures, which naturally lead to achieving better docking scores.

Token probability evolution across epochs.

Two Pocket-Based Molecular Generation

A significant challenge and opportunity in modern drug discovery lies in designing molecules that effectively target multiple protein pockets, , with current state-of-the-art models for pocket-based generation are being typically limited to a single protein pocket as input. Our literature review indicates that LigBuilder V3 utilizes a genetic algorithm to generate molecules targeting multiple protein pockets. However, we were unable to reproduce its code for experimental comparisons.

Here we assess whether ADMolOrgGPT can generate molecules that simultaneously target two protein pockets through a direct RL-based optimization strategy in a single generative process. We present the pairwise optimization results for DYRK1A, BuChE and BACE-1 using the two-pocket reward function from Section Reward Modeling.

Figure illustrates that, for all protein pairs studied, the model enhances the generation of molecules with favorable docking scores. After only a few epochs of training (epoch 15 for the first two cases, and epoch 14 for the third as it performed slightly better than epoch 15), the generated molecules showed improved docking scores for both protein pockets, as seen in Figure a (DYRK1A vs BuChE), Figure b (DYRK1A vs BACE-1), and Figure c (BuChE vs BACE-1). This is evident from the comparison between the blue boxplot (molecules without optimization) and the orange one (molecules after optimization). The observed enhancements ranged from 3% to 9%, which represent a meaningful improvement in docking performance.

Multitarget docking optimization across three protein pairs.

To assess whether our model generates genuinely new dual-active molecules, rather than combinations of known scaffolds, we applied the following procedure:

We collected all active molecules for DYRK1A and BuChE from ChEMBL.
We computed the Tanimoto similarity between the Morgan fingerprints of the ADMolOrgGPT generated molecules and those in the ChEMBL sets.

Each generated molecule was labeled as novel or not based on a maximum similarity threshold of 0.4. We then checked whether its scaffold was already present in ChEMBL. Table shows that almost all generated molecules are structurally novel, and even those with higher similarity still contain unseen scaffolds.

5. Confusion Matrix between Molecule and Scaffold Novelty.

	scaffold novel	scaffold not novel
Novel molecule	241	5
Not novel molecule	10	0

Open in a new tab

Finally, Figure summarizes representative outputs from ADMolOrgGPT under three generation settings for the three protein pockets. All the molecules represented meet the pharmacological property filters specified in Table .

Representative molecules generated by ADMolOrgGPT across different tasks: (a) single pocket-based molecular generation; (b) scaffold-constrained pocket-based generation; (c) two pocket-based molecular generation.

Docking Benchmark

To validate the molecules generated through the proposed generative workflow, a molecular docking analysis was carried out using the Schrödinger suite, following the methodology detailed in Subsection Docking Score Benchmarking. The results, summarized in Tables S4–S6 of the Supporting Information, confirm that all newly generated molecules successfully dock into the binding sites of their respective protein targets.

Overall, the generated compounds exhibit docking scores comparable to those of the reference ligands, suggesting a similar or potentially improved binding affinity. The docking poses reveal that these molecules consistently produce key interactions with crucial active-site residues, many of which are also involved in binding with the crystallographic ligands.

Figure shows that the proposed ligands fit into the pocket of the target molecules. Besides, some of the interacting sites obtained for the generated molecules coincide with those obtained for the reference ligands, specifically those present in the crystallized 6EIF, 7Q1P, and 5TOL protein structures (see Figure S1 in the Supporting Information). For instance, in the case of the DYRK1A protein, Phe238 and Leu241 tend to form aromatic π–π stacking interactions and hydrogen bonds with the ligand, respectively. In the case of the BuChE protein, Trp82 and Phe329 also form aromatic π–π stacking interactions with the ligand in most of the predicted poses. As could be expected, some new interactions which are not present in the reference protein structures are established in the poses adopted by the generated molecules, thus enabling the exploration of new binding sites. It can also be appreciated in Figure that the model is effectively learning to produce molecules that bind to specific regions and functional groups of the proteins. For example, in the case of the top ranked multitarget molecules, they interact with some common residues of the BuChe and DYRK1A proteins. In the case of BuChe protein, Phe329, Tyr332, Gly116, and Gly117 residues show common interactions for different molecules, while for the case of the DYRK1A protein this happens with Asn244 and Lys167 residues.

Schrödinger suite representation of docking poses adopted by some of the generated molecules.

In particular, the multitarget ligand @22, designed to inhibit both DYRK1A and BuChE, exhibited favorable docking scores against both targets, suggesting high predicted binding affinities. Moreover, @22 shows key interactions, such as hydrogen bonding with Leu241 and Lys167 in DYRK1A, and the π–π stacking with Trp332 and hydrogen bonds with Gly116 and Gly117 in BuChE. Additionally, @22 established novel contacts absent in the crystallized complexes, potentially stabilizing its binding and enabling engagement with alternative subpockets, which may contribute to its dual inhibitory capacity.

Taken together, these findings highlight the strong binding potential of the designed ligands, including both those with selectivity toward a single target and multitarget candidates such as @22. In particular, @22 stands out for its ability to engage essential residues in both enzymes while also exploring new interaction possibilities that could enhance its efficacy and selectivity.

Discussion

We have showcased how our framework provides an innovative approach to address major limitations found in current protein pocket-based molecular generative models: their tendency to generate unrealistic or toxic molecules, their inability to optimize around a predefined scaffold, and their challenge when aiming for multitarget compounds capable of simultaneously addressing multiple protein pockets.

Unlike other models that depend on databases optimized for affinity scores, ADMolOrgGPT achieves results that are nearly on par with state-of-the-art methods like Pocket2Mol and DrugGPT, excelling in producing drug-like, nontoxic molecules with favorable LogP values. This demonstrates the effectiveness of our framework coupling transformer-based LLMs with RL in the domain of drug discovery.

We conclude by discussing relevant strategic aspects when implementing the framework, as well as some open issues.

Analysis of Baseline Training Data Sets

We compared the physicochemical property distributions of the data sets used by the baseline models. Following the guidelines of TargetDiff, we extracted the molecules from CrossDocked and combined them with those from the jglaser/binding_affinity data set, forming the mixed baseline distribution used for analysis. For comparison, we randomly sampled approximately 600000 molecules from ZINC20 and from the mixed baseline distribution. Table shows that the mixed baseline contains molecules with lower QED, higher LogP, greater synthetic difficulty, higher molecular weight, and a higher tendency to toxicity relative to ZINC20. For docking evaluation purposes, 256 molecules from each data set were docked against the three pockets under study.

6. Comparison of Molecular Property Distributions between the Mixed Baseline Dataset and ZINC20 .

	mixed baseline	ZINC20
Docking (kcal/mol, ↓)	–8.57 ± 1.56	–7.90 ± 1.04
QED (↑)	0.51 ± 0.21	0.74 ± 0.13
LogP (%, 0–5, ↑)	72.46 ± 0.06	99.99 ± 0.00
SAS (↓)	3.24 ± 0.93	3.18 ± 0.60
MW (↓)	438.31 ± 229.24	360.85 ± 54.46
Toxicity (%, ↓)	32.53 ± 0.06	6.48 ± 0.03

Open in a new tab

↑ Higher is better. ↓ Lower is better.

According to the Vina scoring function (eq ), hydrophobic contacts and hydrogen bonding drive favorable docking scores. Consequently, Pocket2Mol and DrugGPT inherit a bias toward generating heavier and more hydrophobic molecules that naturally achieve better docking scores. In contrast, MolOrgGPT is pretrained on the filtered ZINC20 subset, which contains more drug-like molecules with moderate docking performance. From a drug-design perspective, this bias should be perceived beneficial rather than limiting. Early stage discovery pipelines apply physicochemical filtering prior to synthesis. Pretraining MolOrgGPT on ZINC20 therefore aligns with standard practice: the base model learns to generate drug-like structures, while PPO-RL subsequently optimizes binding without compromising more basic properties. Although MolOrgGPT does not incorporate structural protein information during pretraining, the PPO-RL stage effectively compensates for this, enabling the model to reach docking performances comparable to those of Pocket2Mol and DrugGPT.

Model Comparison According to Size

As Figure illustrates, and is generally assumed in the LLMs literature, increasing the number of parameters generally enhances performance. In our case, this means reducing the average docking scores of the generated molecules, making ADMolOrgGPT-XL the most relevant model. Interestingly, this model tends to converge in fewer epochs compared to its smaller variants, as Figure shows. This faster convergence is important, as it suggests that the model can achieve high-quality solutions without deviating significantly from its original distribution.

Docking results comparison according to model size

Best epoch comparison according to model size.

Less Successful Attempts

While developing MolOrgGPT, we faced some less successful attempts that provide lessons for future related studies.

Molecular Property Prediction

We began by assessing the performance of MolOrgGPT-M, L, and XL on different tasks related to molecular property prediction. For benchmarking purposes, we compared our models with the leading methods from the Triplet-Message Network (TRIMNET). The decision to use scaffold-based or random data splits was determined by the specific task at hand, and we adhered to the guidelines set forth by MoleculeNet. Overall, although our model performed strongly, it did not surpass current state-of-the-art methods. This outcome is consistent with prior reports showing similar trends, indicating that other architectures, such as Graph Neural Networks (GNNs), performed better.

Tokenization

As discussed in the context of natural language processing, tokenization is key for the effective use of LLMs. The tokenizers initially considered were mentioned in Section Training Process. Recalling that the purpose of our study was to use RL to enhance pocket-based generation tasks, the use of a SMILES dictionary suffered inherent limitations, because when the algorithm varied its policy, many of the newly generated molecules were not chemically valid. As a result, the models did not improve in generating the property of interest. Importantly, using SELFIES mitigated this issue, as it always generates valid molecules.

Multiple Properties Optimization

While our models can enhance the generation of molecules with specific properties, such as fixing a certain molecular weight, ensuring nontoxicity, or optimizing docking, they struggle when tasked with improving all of these characteristics simultaneously. Hence there is a need to further advance in adapting multiobjective RL methods and coupling them with LLM based molecular generation.

PPO Hyperparameters Optimization

The PPO algorithm used is especially sensitive to the learning rate: if it is too high, it overshoots the solution, while if it is too low, the algorithm fails to effectively learn. Additionally, the number of epochs is a relevant factor in training. The ADMolOrgGPT-XL model requires fewer epochs to achieve an optimal solution, when compared with ADMolOrgGPT-M and L models. Moreover, if the model updates its weights for too many epochs, it risks reducing the diversity of the generated molecules, potentially converging to a single solution.

Future Lines

From this research, several avenues emerge for future investigation. There are other RL methods that apparently can effectively fine-tune the LLMs, such as Augmented Hill Climb or Regularized Maximum Likelihood Estimation. , Besides, the application of low-rank adaptation (LoRA) enables foundational models like MolOrgGPT to be efficiently fine-tuned using molecules that show high activity toward a specific protein of interest; this methodology could allow the model to generate molecules with high activity values.

Regarding model size, as discussed in Section Model Comparison According to Size, increasing the number of model parameters tends to enhance performance. Currently, our computational resources enable us to train a GPT-2 XL model with 1.5 B parameters. However, training even larger models could improve performance.

Finally, although our group focuses on AD, making the proteins studied in this work directly relevant, a similar pipeline could be applied to any target protein as long as a binding-site structure is available.

Supplementary Material

ci5c02400_si_001.pdf^{(1,000.2KB, pdf)}

Acknowledgments

This work was funded by grants PID2022-137331OB-C33 (N.E.C.) and MICIU/AEI/10.13039/501100011033, FEDER and EU and project TED2021-129970B-C21 (N.E.C.) financed by MCIN/AEI and European Union “NextGenerationEU”/PRTR, IND2023/BMD-27452 (P.V.P.) financed by CAM, and PTQ2023-012950 (G.M.-A.) financed by MCIU/AEI.

The training data for the foundational models can be downloaded from ZINC20 https://zinc.docking.org/. Code and model weights of MolOrgGPT and ADMolOrgGPT will be made available upon request by contacting the corresponding author.

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c02400.

Tables S1–S6: ADMolOrgGPT generated molecules and corresponding docking scores, including single-pocket, scaffold-constrained, and multitarget strategies; Figure S1: Ligand–pocket interactions for selected protein–ligand complexes (PDF)

The authors declare no competing financial interest.

References

Zhang K., Yang X., Wang Y., Yu Y., Huang N., Li G., Li X., Wu J. C., Yang S.. Artificial intelligence in drug development. Nature Medicine. 2025;31:45–59. doi: 10.1038/s41591-024-03434-4. [DOI] [PubMed] [Google Scholar]
Arnold C.. Inside the nascent industry of AI-designed drugs. Nat. Med. 2023;29:1292–1295. doi: 10.1038/s41591-023-02361-0. [DOI] [PubMed] [Google Scholar]
Gallego V., Naveiro R., Roca C., Rios Insua D., Campillo N. E.. AI in drug development: a multidisciplinary perspective. Molecular Diversity. 2021;25:1461–1479. doi: 10.1007/s11030-021-10266-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xie S., Zhu H., Huang N.. AI-Designed Molecules in Drug Discovery, Structural Novelty Evaluation, and Implications. J. Chem. Inf. Model. 2025;65:8924–8933. doi: 10.1021/acs.jcim.5c00921. [DOI] [PubMed] [Google Scholar]
Gromski P. S., Henson A. B., Granda J. M., Cronin L.. How to explore chemical space using algorithms and automation. Nature Reviews Chemistry. 2019;3:119–128. doi: 10.1038/s41570-018-0066-y. [DOI] [Google Scholar]
Polishchuk P. G., Madzhidov T. I., Varnek A.. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of computer-aided molecular design. 2013;27:675–679. doi: 10.1007/s10822-013-9672-4. [DOI] [PubMed] [Google Scholar]
Feng W., Wang L., Lin Z., Zhu Y., Wang H., Dong J., Bai R., Wang H., Zhou J., Peng W.. et al. Generation of 3D molecules in pockets via a language model. Nature Machine Intelligence. 2024;6:62–73. doi: 10.1038/s42256-023-00775-6. [DOI] [Google Scholar]
Deng J., Yang Z., Wang H., Ojima I., Samaras D., Wang F.. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 2023;14:6395. doi: 10.1038/s41467-023-41948-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dai J., Zhou Z., Zhao Y., Kong F., Zhai Z., Zhu Z., Cai J., Huang S., Xu Y., Sun T.. Combined usage of ligand-and structure-based virtual screening in the artificial intelligence era. Eur. J. Med. Chem. 2025;283:117162. doi: 10.1016/j.ejmech.2024.117162. [DOI] [PubMed] [Google Scholar]
Jin, W. ; Barzilay, R. ; Jaakkola, T. . Multi-objective molecule generation using interpretable substructures. International conference on machine learning; 2020; pp 4849–4859. [Google Scholar]
Jin, W. ; Barzilay, R. ; Jaakkola, T. . Hierarchical generation of molecular graphs using structural motifs. International conference on machine learning; 2020; pp 4839–4848. [Google Scholar]
Subramanian G., Ramsundar B., Pande V., Denny R. A.. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J. Chem. Inf. Model. 2016;56:1936–1949. doi: 10.1021/acs.jcim.6b00290. [DOI] [PubMed] [Google Scholar]
García E. G., Varas P., González-Naranjo P., Ulzurrun E., Marcos-Ayuso G., Pérez C., Páez J. A., Insua D. R., Santana S. R., Campillo N. E.. AI-Driven De Novo Design and Development of Nontoxic DYRK1A Inhibitors. J. Med. Chem. 2025;68:10346. doi: 10.1021/acs.jmedchem.5c00512. [DOI] [PMC free article] [PubMed] [Google Scholar]
Besnard J., Ruda G. F., Setola V., Abecassis K., Rodriguiz R. M., Huang X.-P., Norval S., Sassano M. F., Shin A. I., Webster L. A.. et al. Automated design of ligands to polypharmacological profiles. Nature. 2012;492:215–220. doi: 10.1038/nature11691. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stokes J. M., Yang K., Swanson K., Jin W., Cubillos-Ruiz A., Donghia N. M., MacNair C. R., French S., Carfrae L. A., Bloom-Ackermann Z.. et al. A deep learning approach to antibiotic discovery. Cell. 2020;180:688–702. doi: 10.1016/j.cell.2020.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prathipati P., Dixit A., Saxena A. K.. Computer-aided drug design: Integration of structure-based and ligand-based approaches in drug design. Current Computer-Aided Drug Design. 2007;3:133–148. doi: 10.2174/157340907780809516. [DOI] [Google Scholar]
Tang Y., Moretti R., Meiler J.. Recent advances in automated structure-based de novo drug design. J. Chem. Inf. Model. 2024;64:1794–1805. doi: 10.1021/acs.jcim.4c00247. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bilodeau C., Jin W., Jaakkola T., Barzilay R., Jensen K. F.. Generative models for molecular discovery: Recent advances and challenges. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2022;12:e1608. doi: 10.1002/wcms.1608. [DOI] [Google Scholar]
Sadybekov A. V., Katritch V.. Computational approaches streamlining drug discovery. Nature. 2023;616:673–685. doi: 10.1038/s41586-023-05905-z. [DOI] [PubMed] [Google Scholar]
Li A. P.. Accurate prediction of human drug toxicity: a major challenge in drug development. Chemico-biological interactions. 2004;150:3–7. doi: 10.1016/j.cbi.2004.09.008. [DOI] [PubMed] [Google Scholar]
Ren F., Aliper A., Chen J., Zhao H., Rao S., Kuppe C., Ozerov I. V., Zhang M., Witte K., Kruse C.. et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 2025;43:63–75. doi: 10.1038/s41587-024-02143-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schneider P., Walters W. P., Plowright A. T., Sieroka N., Listgarten J., Goodnow Jr R. A., Fisher J., Jansen J. M., Duca J. S., Rush T. S.. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discovery. 2020;19:353–364. doi: 10.1038/s41573-019-0050-3. [DOI] [PubMed] [Google Scholar]
Zhang Y., Li S., Xing M., Yuan Q., He H., Sun S.. Universal approach to de novo drug design for target proteins using deep reinforcement learning. ACS omega. 2023;8:5464–5474. doi: 10.1021/acsomega.2c06653. [DOI] [PMC free article] [PubMed] [Google Scholar]
Trott O., Olson A. J.. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry. 2010;31:455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo J., Janet J. P., Bauer M. R., Nittinger E., Giblin K. A., Papadopoulos K., Voronov A., Patronov A., Engkvist O., Margreitter C.. DockStream: a docking wrapper to enhance de novo molecular design. Journal of Cheminformatics. 2021;13:89. doi: 10.1186/s13321-021-00563-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Corso G., Stärk H., Jing B., Barzilay R., Jaakkola T.. Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv:2210.01776 [q-bio.BM] 2022:na. doi: 10.48550/arXiv.2210.01776. [DOI] [Google Scholar]
Jeon W., Kim D.. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Sci. Rep. 2020;10:22104. doi: 10.1038/s41598-020-78537-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thomas M., O’Boyle N., Bender A., de Graaf C.. Augmented hill-climb increases reinforcement learning efficiency for language-based de novo molecule generation. J. Cheminformatics. 2022;14:68. doi: 10.1186/s13321-022-00646-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loeffler H. H., He J., Tibo A., Janet J. P., Voronov A., Mervin L. H., Engkvist O.. Reinvent 4: modern AI–driven generative molecule design. Journal of Cheminformatics. 2024;16:20. doi: 10.1186/s13321-024-00812-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu X., Liu G., Zhao Y., Zhang H.. De novo drug design using reinforcement learning with multiple gpt agents. Advances in Neural Information Processing Systems. 2023;36:7405–7418. [Google Scholar]
Zhou X., Cheng X., Yang Y., Bao Y., Wang L., Gu Q.. Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization. arXiv:2403.13829 [q-bio.BM] 2024:na. doi: 10.48550/arXiv.2403.13829. [DOI] [Google Scholar]
Munson B. P., Chen M., Bogosian A., Kreisberg J. F., Licon K., Abagyan R., Kuenzi B. M., Ideker T.. De novo generation of multi-target compounds using deep generative chemistry. Nat. Commun. 2024;15:3636. doi: 10.1038/s41467-024-47120-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang W., Pei J., Lai L.. Computational multitarget drug design. J. Chem. Inf. Model. 2017;57:403–412. doi: 10.1021/acs.jcim.6b00491. [DOI] [PubMed] [Google Scholar]
Liu Z., Roberts R. A., Lal-Nag M., Chen X., Huang R., Tong W.. AI-based language models powering drug discovery and development. Drug Discovery Today. 2021;26:2593–2607. doi: 10.1016/j.drudis.2021.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Flam-Shepherd D., Zhu K., Aspuru-Guzik A.. Language models can learn complex molecular distributions. Nat. Commun. 2022;13:3293. doi: 10.1038/s41467-022-30839-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng Y., Koh H. Y., Ju J., Nguyen A. T., May L. T., Webb G. I., Pan S.. Large language models for scientific discovery in molecular property prediction. Nature Machine Intelligence. 2025;7:437. doi: 10.1038/s42256-025-00994-z. [DOI] [Google Scholar]
Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O.. Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG] 2017:na. doi: 10.48550/arXiv.1707.06347. [DOI] [Google Scholar]
Mazuz E., Shtar G., Shapira B., Rokach L.. Molecule generation using transformers and policy gradient reinforcement learning. Sci. Rep. 2023;13:8799. doi: 10.1038/s41598-023-35648-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Popova M., Isayev O., Tropsha A.. Deep reinforcement learning for de novo drug design. Science advances. 2018;4:eaap7885. doi: 10.1126/sciadv.aap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I.. et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1:9. [Google Scholar]
Irwin J. J., Tang K. G., Young J., Dandarchuluun C., Wong B. R., Khurelbaatar M., Moroz Y. S., Mayfield J., Sayle R. A.. ZINC20a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 2020;60:6065–6073. doi: 10.1021/acs.jcim.0c00675. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weininger D.. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
Krenn M., Häse F., Nigam A., Friederich P., Aspuru-Guzik A.. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology. 2020;1:045024. doi: 10.1088/2632-2153/aba947. [DOI] [Google Scholar]
Nogueira V. H., Sharma R., Guido R. V., Keiser M. J.. Fuzz testing molecular representation using deep variational anomaly generation. J. Chem. Inf. Model. 2025;65:1911–1927. doi: 10.1021/acs.jcim.4c01876. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Souza M. M., Cenci A. R., Teixeira K. F., Machado V., Mendes Schuler M. C. G., Gon A. E., Paula Dalmagro A., André Cazarin C., Gomes Ferreira L. L., de Oliveira A. S.. et al. DYRK1A Inhibitors and Perspectives for the Treatment of Alzheimer’s Disease. Curr. Med. Chem. 2023;30:669–688. doi: 10.2174/0929867329666220620162018. [DOI] [PubMed] [Google Scholar]
John V., Beck J. P., Bienkowski M. J., Sinha S., Heinrikson R. L.. Human β-secretase (BACE) and BACE inhibitors. Journal of medicinal chemistry. 2003;46:4625–4630. doi: 10.1021/jm030247h. [DOI] [PubMed] [Google Scholar]
Greig N. H., Utsuki T., Yu Q.-s., Zhu X., Holloway H. W., Perry T., Lee B., Ingram D. K., Lahiri D. K.. A new therapeutic target in Alzheimer’s disease treatment: attention to butyrylcholinesterase. Current medical research and opinion. 2001;17:159–165. doi: 10.1185/03007990152673800. [DOI] [PubMed] [Google Scholar]
Peng, X. ; Luo, S. ; Guan, J. ; Xie, Q. ; Peng, J. ; Ma, J. . Pocket2mol: Efficient molecular sampling based on 3d protein pockets. International Conference on Machine Learning; 2022; pp 17644–17655. [Google Scholar]
Li, Y. ; Gao, C. ; Song, X. ; Wang, X. ; Xu, Y. ; Han, S. . DrugGPT: A GPT-based strategy for designing potential ligands targeting specific proteins. bioRxiv 2023, .na 10.1101/2023.06.29.543848 [DOI] [Google Scholar]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser Ł., Polosukhin I.. Attention is all you need. Advances in neural information processing systems. 2017;30:na. [Google Scholar]
Mao J., Wang J., Zeb A., Cho K.-H., Jin H., Kim J., Lee O., Wang Y., No K. T.. Transformer-based molecular generative model for antiviral drug design. J. Chem. Inf. Model. 2024;64:2733–2745. doi: 10.1021/acs.jcim.3c00536. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chithrananda S., Grand G., Ramsundar B.. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv:2010.09885 [cs.LG] 2020:na. doi: 10.48550/arXiv.2010.09885. [DOI] [Google Scholar]
Devlin, J. ; Chang, M.-W. ; Lee, K. ; Toutanova, K. . Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers); 2019; pp 4171–4186. [Google Scholar]
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V.. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 [cs.CL] 2019:na. doi: 10.48550/arXiv.1907.11692. [DOI] [Google Scholar]
Heyer, K. Roberta-zinc-480m. https://huggingface.co/entropy/roberta_zinc_480m, 2023; MIT License. Chemical masked-language model trained on 480M SMILES from the ZINC database.
Bagal V., Aggarwal R., Vinod P., Priyakumar U. D.. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 2022;62:2064–2076. doi: 10.1021/acs.jcim.1c00600. [DOI] [PubMed] [Google Scholar]
Karpathy, A. NanoGPT. 2022. https://github.com/karpathy/nanoGPT.
Kuncham, G. K. R. ; Vaidya, R. ; Barve, M. . Performance Study of GPU applications using SYCL and CUDA on Tesla V100 GPU. 2021 IEEE High Performance Extreme Computing Conference (HPEC). 2021; pp 1–7. [Google Scholar]
Lipinski C. A.. Lead-and drug-like compounds: the rule-of-five revolution. Drug discovery today: Technologies. 2004;1:337–341. doi: 10.1016/j.ddtec.2004.11.007. [DOI] [PubMed] [Google Scholar]
Olivecrona M., Blaschke T., Engkvist O., Chen H.. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics. 2017;9:1–14. doi: 10.1186/s13321-017-0235-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sennrich R., Haddow B., Birch A.. Neural machine translation of rare words with subword units. arXiv:1508.07909 [cs.CL] 2015:na. doi: 10.48550/arXiv.1508.07909. [DOI] [Google Scholar]
Leon M., Perezhohin Y., Peres F., Popovič A., Castelli M.. Comparing SMILES and SELFIES tokenization for enhanced chemical language modeling. Sci. Rep. 2024;14:25016. doi: 10.1038/s41598-024-76440-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hayes C. F., Rădulescu R., Bargiacchi E., Källström J., Macfarlane M., Reymond M., Verstraeten T., Zintgraf L. M., Dazeley R., Heintz F.. et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems. 2022;36:26. doi: 10.1007/s10458-022-09552-y. [DOI] [Google Scholar]
Schrödinger, LLC Schrödinger platform for drug discovery. 2024; https://www.schrodinger.com (Accessed: 2025–08–06).
Czarna A., Wang J., Zelencova D., Liu Y., Deng X., Choi H. G., Zhang T., Zhou W., Chang J. W., Kildalsen H.. et al. Novel scaffolds for dual specificity tyrosine-phosphorylation-regulated kinase (DYRK1A) inhibitors. Journal of Medicinal Chemistry. 2018;61:7560–7572. doi: 10.1021/acs.jmedchem.7b01847. [DOI] [PubMed] [Google Scholar]
Panek D., Pasieka A., Latacz G., Zarȩba P., Szczȩch M., Godyń J., Chantegreil F., Nachon F., Brazzolotto X., Skrzypczak-Wiercioch A.. et al. Discovery of new, highly potent and selective inhibitors of BuChE-design, synthesis, in vitro and in vivo evaluation and crystallography studies. Eur. J. Med. Chem. 2023;249:115135. doi: 10.1016/j.ejmech.2023.115135. [DOI] [PubMed] [Google Scholar]
Wu Y.-J., Guernon J., Rajamani R., Toyn J. H., Ahlijanian M. K., Albright C. F., Muckelbauer J., Chang C., Camac D., Macor J. E.. et al. Discovery of furo [2,3-d][1,3] thiazinamines as beta amyloid cleaving enzyme-1 (BACE1) inhibitors. Bioorganic and Medicinal Chemistry Letters. 2016;26:5729–5731. doi: 10.1016/j.bmcl.2016.10.055. [DOI] [PubMed] [Google Scholar]
Schrödinger; LLC Schrödinger Release 2021–1: LigPrep: New York, NY, 2021. [Google Scholar]
Schrödinger; LLC Schrödinger Release 2021–1: Epik: New York, NY, 2021. [Google Scholar]
Jorgensen W. L., Maxwell D. S., Tirado-Rives J.. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. Journal of the American chemical society. 1996;118:11225–11236. doi: 10.1021/ja9621760. [DOI] [Google Scholar]
Schrödinger Release 2021–1: Protein Preparation Wizard; Epik, Schrödinger, LLC: New York, NY, 2021; Impact, Schrödinger, LLC, New York, NY; Prime, Schrödinger, LLC, New York, NY, 2021. Schrödinger, LLC: New York, NY, 2021. [Google Scholar]
Tang X., Dai H., Knight E., Wu F., Li Y., Li T., Gerstein M.. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Briefings in Bioinformatics. 2024;25:bbae338. doi: 10.1093/bib/bbae338. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spiegel J. O., Durrant J. D.. AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of Cheminformatics. 2020;12:1–16. doi: 10.1186/s13321-020-00429-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L.. Quantifying the chemical beauty of drugs. Nature Chem. 2012;4:90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
RDKit: Open-source cheminformatics. https://www.rdkit.org (accessed: 12-Jun-2024).
Ertl P., Schuffenhauer A.. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics. 2009;1:1–11. doi: 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bajusz D., Rácz A., Héberger K.. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of Cheminformatics. 2015;7:1–13. doi: 10.1186/s13321-015-0069-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heid E., Greenman K. P., Chung Y., Li S.-C., Graff D. E., Vermeire F. H., Wu H., Green W. H., McGill C. J.. Chemprop: a machine learning package for chemical property prediction. J. Chem. Inf. Model. 2024;64:9–17. doi: 10.1021/acs.jcim.3c01250. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richard A. M., Huang R., Waidyanatha S., Shinn P., Collins B. J., Thillainadarajah I., Grulke C. M., Williams A. J., Lougee R. R., Judson R. S.. et al. The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 2021;34:189–216. doi: 10.1021/acs.chemrestox.0c00264. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown N., Fiscato M., Segler M. H., Vaucher A. C.. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 2019;59:1096–1108. doi: 10.1021/acs.jcim.8b00839. [DOI] [PubMed] [Google Scholar]
Francoeur P. G., Masuda T., Sunseri J., Jia A., Iovanisci R. B., Snyder I., Koes D. R.. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 2020;60:4200–4215. doi: 10.1021/acs.jcim.0c00411. [DOI] [PMC free article] [PubMed] [Google Scholar]
jglaser/binding_affinity Dataset. https://huggingface.co/datasets/jglaser/binding_affinity.
Huang L., Xu T., Yu Y., Zhao P., Chen X., Han J., Xie Z., Li H., Zhong W., Wong K.-C.. et al. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat. Commun. 2024;15:2657. doi: 10.1038/s41467-024-46569-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schneuing A., Harris C., Du Y., Didi K., Jamasb A., Igashov I., Du W., Gomes C., Blundell T. L., Lio P.. et al. Structure-based drug design with equivariant diffusion models. Nature Computational Science. 2024;4:899–909. doi: 10.1038/s43588-024-00737-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu K., Xia Y., Deng P., Liu R., Zhang Y., Guo H., Cui Y., Pei Q., Wu L., Xie S.. et al. TamGen: drug design with target-aware molecule generation through a chemical language model. Nat. Commun. 2024;15:9360. doi: 10.1038/s41467-024-53632-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xie J., Chen S., Lei J., Yang Y.. DiffDec: structure-aware scaffold decoration with an end-to-end diffusion model. J. Chem. Inf. Model. 2024;64:2554–2564. doi: 10.1021/acs.jcim.3c01466. [DOI] [PubMed] [Google Scholar]
Bolognesi M. L., Cavalli A.. Multitarget drug discovery and polypharmacology. ChemMedChem. 2016;11:1190–1192. doi: 10.1002/cmdc.201600161. [DOI] [PubMed] [Google Scholar]
Medina-Franco J. L., Giulianotti M. A., Welmaker G. S., Houghten R. A.. Shifting from the single to the multitarget paradigm in drug discovery. Drug discovery today. 2013;18:495–501. doi: 10.1016/j.drudis.2013.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan Y., Pei J., Lai L.. Ligbuilder v3: a multi-target de novo drug design approach. Frontiers in Chemistry. 2020;8:142. doi: 10.3389/fchem.2020.00142. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaulton A., Hersey A., Nowotka M., Bento A. P., Chambers J., Mendez D., Mutowo P., Atkinson F., Bellis L. J., Cibrián-Uhalte E.. et al. The ChEMBL database in 2017. Nucleic Acids Research. 2017;45:D945–D954. doi: 10.1093/nar/gkw1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S., Jo J., Hwang S. J.. Exploring chemical space with score-based out-of-distribution generation. International Conference on Machine Learning. 2023:18872–18892. [Google Scholar]
Guan J., Qian W. W., Peng X., Su Y., Peng J., Ma J.. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv:2303.03543 [q-bio.BM] 2023:na. doi: 10.48550/arXiv.2303.03543. [DOI] [Google Scholar]
Kaplan J., McCandlish S., Henighan T., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D.. Scaling laws for neural language models. arXiv:2001.08361 [cs.LG] 2020:na. doi: 10.48550/arXiv.2001.08361. [DOI] [Google Scholar]
Li P., Li Y., Hsieh C.-Y., Zhang S., Liu X., Liu H., Song S., Yao X.. TrimNet: learning molecular representation from triplet messages for biomedicine. Briefings in Bioinformatics. 2021;22:bbaa266. doi: 10.1093/bib/bbaa266. [DOI] [PubMed] [Google Scholar]
Wu Z., Ramsundar B., Feinberg E. N., Gomes J., Geniesse C., Pappu A. S., Leswing K., Pande V.. MoleculeNet: a benchmark for molecular machine learning. Chemical science. 2018;9:513–530. doi: 10.1039/C7SC02664A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang D., Li Y., Jiang J., Ding Z., Luo Z., Jiang G., Liang J., Yang D.. Tokenization matters! degrading large language models through challenging their tokenization. arXiv:2405.17067 [cs.CL] 2024:na. doi: 10.48550/arXiv.2405.17067. [DOI] [Google Scholar]
Gummesson Svensson H., Tyrchan C., Engkvist O., Haghir Chehreghani M.. Utilizing reinforcement learning for de novo drug design. Machine Learning. 2024;113:4811–4843. doi: 10.1007/s10994-024-06519-w. [DOI] [Google Scholar]
Hu E. J., Shen Y., Wallis P., Allen-Zhu Z., Li Y., Wang S., Wang L., Chen W.. et al. Lora: Low-rank adaptation of large language models. ICLR. 2022;1:3. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci5c02400_si_001.pdf^{(1,000.2KB, pdf)}

Data Availability Statement

[ref1] Zhang K., Yang X., Wang Y., Yu Y., Huang N., Li G., Li X., Wu J. C., Yang S.. Artificial intelligence in drug development. Nature Medicine. 2025;31:45–59. doi: 10.1038/s41591-024-03434-4. [DOI] [PubMed] [Google Scholar]

[ref2] Arnold C.. Inside the nascent industry of AI-designed drugs. Nat. Med. 2023;29:1292–1295. doi: 10.1038/s41591-023-02361-0. [DOI] [PubMed] [Google Scholar]

[ref3] Gallego V., Naveiro R., Roca C., Rios Insua D., Campillo N. E.. AI in drug development: a multidisciplinary perspective. Molecular Diversity. 2021;25:1461–1479. doi: 10.1007/s11030-021-10266-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Xie S., Zhu H., Huang N.. AI-Designed Molecules in Drug Discovery, Structural Novelty Evaluation, and Implications. J. Chem. Inf. Model. 2025;65:8924–8933. doi: 10.1021/acs.jcim.5c00921. [DOI] [PubMed] [Google Scholar]

[ref5] Gromski P. S., Henson A. B., Granda J. M., Cronin L.. How to explore chemical space using algorithms and automation. Nature Reviews Chemistry. 2019;3:119–128. doi: 10.1038/s41570-018-0066-y. [DOI] [Google Scholar]

[ref6] Polishchuk P. G., Madzhidov T. I., Varnek A.. Estimation of the size of drug-like chemical space based on GDB-17 data. Journal of computer-aided molecular design. 2013;27:675–679. doi: 10.1007/s10822-013-9672-4. [DOI] [PubMed] [Google Scholar]

[ref7] Feng W., Wang L., Lin Z., Zhu Y., Wang H., Dong J., Bai R., Wang H., Zhou J., Peng W.. et al. Generation of 3D molecules in pockets via a language model. Nature Machine Intelligence. 2024;6:62–73. doi: 10.1038/s42256-023-00775-6. [DOI] [Google Scholar]

[ref8] Deng J., Yang Z., Wang H., Ojima I., Samaras D., Wang F.. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 2023;14:6395. doi: 10.1038/s41467-023-41948-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] Dai J., Zhou Z., Zhao Y., Kong F., Zhai Z., Zhu Z., Cai J., Huang S., Xu Y., Sun T.. Combined usage of ligand-and structure-based virtual screening in the artificial intelligence era. Eur. J. Med. Chem. 2025;283:117162. doi: 10.1016/j.ejmech.2024.117162. [DOI] [PubMed] [Google Scholar]

[ref10] Jin, W. ; Barzilay, R. ; Jaakkola, T. . Multi-objective molecule generation using interpretable substructures. International conference on machine learning; 2020; pp 4849–4859. [Google Scholar]

[ref11] Jin, W. ; Barzilay, R. ; Jaakkola, T. . Hierarchical generation of molecular graphs using structural motifs. International conference on machine learning; 2020; pp 4839–4848. [Google Scholar]

[ref12] Subramanian G., Ramsundar B., Pande V., Denny R. A.. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J. Chem. Inf. Model. 2016;56:1936–1949. doi: 10.1021/acs.jcim.6b00290. [DOI] [PubMed] [Google Scholar]

[ref13] García E. G., Varas P., González-Naranjo P., Ulzurrun E., Marcos-Ayuso G., Pérez C., Páez J. A., Insua D. R., Santana S. R., Campillo N. E.. AI-Driven De Novo Design and Development of Nontoxic DYRK1A Inhibitors. J. Med. Chem. 2025;68:10346. doi: 10.1021/acs.jmedchem.5c00512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] Besnard J., Ruda G. F., Setola V., Abecassis K., Rodriguiz R. M., Huang X.-P., Norval S., Sassano M. F., Shin A. I., Webster L. A.. et al. Automated design of ligands to polypharmacological profiles. Nature. 2012;492:215–220. doi: 10.1038/nature11691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Stokes J. M., Yang K., Swanson K., Jin W., Cubillos-Ruiz A., Donghia N. M., MacNair C. R., French S., Carfrae L. A., Bloom-Ackermann Z.. et al. A deep learning approach to antibiotic discovery. Cell. 2020;180:688–702. doi: 10.1016/j.cell.2020.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Prathipati P., Dixit A., Saxena A. K.. Computer-aided drug design: Integration of structure-based and ligand-based approaches in drug design. Current Computer-Aided Drug Design. 2007;3:133–148. doi: 10.2174/157340907780809516. [DOI] [Google Scholar]

[ref17] Tang Y., Moretti R., Meiler J.. Recent advances in automated structure-based de novo drug design. J. Chem. Inf. Model. 2024;64:1794–1805. doi: 10.1021/acs.jcim.4c00247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Bilodeau C., Jin W., Jaakkola T., Barzilay R., Jensen K. F.. Generative models for molecular discovery: Recent advances and challenges. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2022;12:e1608. doi: 10.1002/wcms.1608. [DOI] [Google Scholar]

[ref19] Sadybekov A. V., Katritch V.. Computational approaches streamlining drug discovery. Nature. 2023;616:673–685. doi: 10.1038/s41586-023-05905-z. [DOI] [PubMed] [Google Scholar]

[ref20] Li A. P.. Accurate prediction of human drug toxicity: a major challenge in drug development. Chemico-biological interactions. 2004;150:3–7. doi: 10.1016/j.cbi.2004.09.008. [DOI] [PubMed] [Google Scholar]

[ref21] Ren F., Aliper A., Chen J., Zhao H., Rao S., Kuppe C., Ozerov I. V., Zhang M., Witte K., Kruse C.. et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 2025;43:63–75. doi: 10.1038/s41587-024-02143-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Schneider P., Walters W. P., Plowright A. T., Sieroka N., Listgarten J., Goodnow Jr R. A., Fisher J., Jansen J. M., Duca J. S., Rush T. S.. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discovery. 2020;19:353–364. doi: 10.1038/s41573-019-0050-3. [DOI] [PubMed] [Google Scholar]

[ref23] Zhang Y., Li S., Xing M., Yuan Q., He H., Sun S.. Universal approach to de novo drug design for target proteins using deep reinforcement learning. ACS omega. 2023;8:5464–5474. doi: 10.1021/acsomega.2c06653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] Trott O., Olson A. J.. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry. 2010;31:455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] Guo J., Janet J. P., Bauer M. R., Nittinger E., Giblin K. A., Papadopoulos K., Voronov A., Patronov A., Engkvist O., Margreitter C.. DockStream: a docking wrapper to enhance de novo molecular design. Journal of Cheminformatics. 2021;13:89. doi: 10.1186/s13321-021-00563-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] Corso G., Stärk H., Jing B., Barzilay R., Jaakkola T.. Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv:2210.01776 [q-bio.BM] 2022:na. doi: 10.48550/arXiv.2210.01776. [DOI] [Google Scholar]

[ref27] Jeon W., Kim D.. Autonomous molecule generation using reinforcement learning and docking to develop potential novel inhibitors. Sci. Rep. 2020;10:22104. doi: 10.1038/s41598-020-78537-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] Thomas M., O’Boyle N., Bender A., de Graaf C.. Augmented hill-climb increases reinforcement learning efficiency for language-based de novo molecule generation. J. Cheminformatics. 2022;14:68. doi: 10.1186/s13321-022-00646-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] Loeffler H. H., He J., Tibo A., Janet J. P., Voronov A., Mervin L. H., Engkvist O.. Reinvent 4: modern AI–driven generative molecule design. Journal of Cheminformatics. 2024;16:20. doi: 10.1186/s13321-024-00812-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Hu X., Liu G., Zhao Y., Zhang H.. De novo drug design using reinforcement learning with multiple gpt agents. Advances in Neural Information Processing Systems. 2023;36:7405–7418. [Google Scholar]

[ref31] Zhou X., Cheng X., Yang Y., Bao Y., Wang L., Gu Q.. Decompopt: Controllable and decomposed diffusion models for structure-based molecular optimization. arXiv:2403.13829 [q-bio.BM] 2024:na. doi: 10.48550/arXiv.2403.13829. [DOI] [Google Scholar]

[ref32] Munson B. P., Chen M., Bogosian A., Kreisberg J. F., Licon K., Abagyan R., Kuenzi B. M., Ideker T.. De novo generation of multi-target compounds using deep generative chemistry. Nat. Commun. 2024;15:3636. doi: 10.1038/s41467-024-47120-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] Zhang W., Pei J., Lai L.. Computational multitarget drug design. J. Chem. Inf. Model. 2017;57:403–412. doi: 10.1021/acs.jcim.6b00491. [DOI] [PubMed] [Google Scholar]

[ref34] Liu Z., Roberts R. A., Lal-Nag M., Chen X., Huang R., Tong W.. AI-based language models powering drug discovery and development. Drug Discovery Today. 2021;26:2593–2607. doi: 10.1016/j.drudis.2021.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] Flam-Shepherd D., Zhu K., Aspuru-Guzik A.. Language models can learn complex molecular distributions. Nat. Commun. 2022;13:3293. doi: 10.1038/s41467-022-30839-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] Zheng Y., Koh H. Y., Ju J., Nguyen A. T., May L. T., Webb G. I., Pan S.. Large language models for scientific discovery in molecular property prediction. Nature Machine Intelligence. 2025;7:437. doi: 10.1038/s42256-025-00994-z. [DOI] [Google Scholar]

[ref37] Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O.. Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG] 2017:na. doi: 10.48550/arXiv.1707.06347. [DOI] [Google Scholar]

[ref38] Mazuz E., Shtar G., Shapira B., Rokach L.. Molecule generation using transformers and policy gradient reinforcement learning. Sci. Rep. 2023;13:8799. doi: 10.1038/s41598-023-35648-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] Popova M., Isayev O., Tropsha A.. Deep reinforcement learning for de novo drug design. Science advances. 2018;4:eaap7885. doi: 10.1126/sciadv.aap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I.. et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1:9. [Google Scholar]

[ref41] Irwin J. J., Tang K. G., Young J., Dandarchuluun C., Wong B. R., Khurelbaatar M., Moroz Y. S., Mayfield J., Sayle R. A.. ZINC20a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 2020;60:6065–6073. doi: 10.1021/acs.jcim.0c00675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref42] Weininger D.. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]

[ref43] Krenn M., Häse F., Nigam A., Friederich P., Aspuru-Guzik A.. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology. 2020;1:045024. doi: 10.1088/2632-2153/aba947. [DOI] [Google Scholar]

[ref44] Nogueira V. H., Sharma R., Guido R. V., Keiser M. J.. Fuzz testing molecular representation using deep variational anomaly generation. J. Chem. Inf. Model. 2025;65:1911–1927. doi: 10.1021/acs.jcim.4c01876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] de Souza M. M., Cenci A. R., Teixeira K. F., Machado V., Mendes Schuler M. C. G., Gon A. E., Paula Dalmagro A., André Cazarin C., Gomes Ferreira L. L., de Oliveira A. S.. et al. DYRK1A Inhibitors and Perspectives for the Treatment of Alzheimer’s Disease. Curr. Med. Chem. 2023;30:669–688. doi: 10.2174/0929867329666220620162018. [DOI] [PubMed] [Google Scholar]

[ref46] John V., Beck J. P., Bienkowski M. J., Sinha S., Heinrikson R. L.. Human β-secretase (BACE) and BACE inhibitors. Journal of medicinal chemistry. 2003;46:4625–4630. doi: 10.1021/jm030247h. [DOI] [PubMed] [Google Scholar]

[ref47] Greig N. H., Utsuki T., Yu Q.-s., Zhu X., Holloway H. W., Perry T., Lee B., Ingram D. K., Lahiri D. K.. A new therapeutic target in Alzheimer’s disease treatment: attention to butyrylcholinesterase. Current medical research and opinion. 2001;17:159–165. doi: 10.1185/03007990152673800. [DOI] [PubMed] [Google Scholar]

[ref48] Peng, X. ; Luo, S. ; Guan, J. ; Xie, Q. ; Peng, J. ; Ma, J. . Pocket2mol: Efficient molecular sampling based on 3d protein pockets. International Conference on Machine Learning; 2022; pp 17644–17655. [Google Scholar]

[ref49] Li, Y. ; Gao, C. ; Song, X. ; Wang, X. ; Xu, Y. ; Han, S. . DrugGPT: A GPT-based strategy for designing potential ligands targeting specific proteins. bioRxiv 2023, .na 10.1101/2023.06.29.543848 [DOI] [Google Scholar]

[ref50] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser Ł., Polosukhin I.. Attention is all you need. Advances in neural information processing systems. 2017;30:na. [Google Scholar]

[ref51] Mao J., Wang J., Zeb A., Cho K.-H., Jin H., Kim J., Lee O., Wang Y., No K. T.. Transformer-based molecular generative model for antiviral drug design. J. Chem. Inf. Model. 2024;64:2733–2745. doi: 10.1021/acs.jcim.3c00536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] Chithrananda S., Grand G., Ramsundar B.. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv:2010.09885 [cs.LG] 2020:na. doi: 10.48550/arXiv.2010.09885. [DOI] [Google Scholar]

[ref53] Devlin, J. ; Chang, M.-W. ; Lee, K. ; Toutanova, K. . Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers); 2019; pp 4171–4186. [Google Scholar]

[ref54] Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V.. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 [cs.CL] 2019:na. doi: 10.48550/arXiv.1907.11692. [DOI] [Google Scholar]

[ref55] Heyer, K. Roberta-zinc-480m. https://huggingface.co/entropy/roberta_zinc_480m, 2023; MIT License. Chemical masked-language model trained on 480M SMILES from the ZINC database.

[ref56] Bagal V., Aggarwal R., Vinod P., Priyakumar U. D.. MolGPT: molecular generation using a transformer-decoder model. J. Chem. Inf. Model. 2022;62:2064–2076. doi: 10.1021/acs.jcim.1c00600. [DOI] [PubMed] [Google Scholar]

[ref57] Karpathy, A. NanoGPT. 2022. https://github.com/karpathy/nanoGPT.

[ref58] Kuncham, G. K. R. ; Vaidya, R. ; Barve, M. . Performance Study of GPU applications using SYCL and CUDA on Tesla V100 GPU. 2021 IEEE High Performance Extreme Computing Conference (HPEC). 2021; pp 1–7. [Google Scholar]

[ref59] Lipinski C. A.. Lead-and drug-like compounds: the rule-of-five revolution. Drug discovery today: Technologies. 2004;1:337–341. doi: 10.1016/j.ddtec.2004.11.007. [DOI] [PubMed] [Google Scholar]

[ref60] Olivecrona M., Blaschke T., Engkvist O., Chen H.. Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics. 2017;9:1–14. doi: 10.1186/s13321-017-0235-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref61] Sennrich R., Haddow B., Birch A.. Neural machine translation of rare words with subword units. arXiv:1508.07909 [cs.CL] 2015:na. doi: 10.48550/arXiv.1508.07909. [DOI] [Google Scholar]

[ref62] Leon M., Perezhohin Y., Peres F., Popovič A., Castelli M.. Comparing SMILES and SELFIES tokenization for enhanced chemical language modeling. Sci. Rep. 2024;14:25016. doi: 10.1038/s41598-024-76440-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref63] Hayes C. F., Rădulescu R., Bargiacchi E., Källström J., Macfarlane M., Reymond M., Verstraeten T., Zintgraf L. M., Dazeley R., Heintz F.. et al. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems. 2022;36:26. doi: 10.1007/s10458-022-09552-y. [DOI] [Google Scholar]

[ref64] Schrödinger, LLC Schrödinger platform for drug discovery. 2024; https://www.schrodinger.com (Accessed: 2025–08–06).

[ref65] Czarna A., Wang J., Zelencova D., Liu Y., Deng X., Choi H. G., Zhang T., Zhou W., Chang J. W., Kildalsen H.. et al. Novel scaffolds for dual specificity tyrosine-phosphorylation-regulated kinase (DYRK1A) inhibitors. Journal of Medicinal Chemistry. 2018;61:7560–7572. doi: 10.1021/acs.jmedchem.7b01847. [DOI] [PubMed] [Google Scholar]

[ref66] Panek D., Pasieka A., Latacz G., Zarȩba P., Szczȩch M., Godyń J., Chantegreil F., Nachon F., Brazzolotto X., Skrzypczak-Wiercioch A.. et al. Discovery of new, highly potent and selective inhibitors of BuChE-design, synthesis, in vitro and in vivo evaluation and crystallography studies. Eur. J. Med. Chem. 2023;249:115135. doi: 10.1016/j.ejmech.2023.115135. [DOI] [PubMed] [Google Scholar]

[ref67] Wu Y.-J., Guernon J., Rajamani R., Toyn J. H., Ahlijanian M. K., Albright C. F., Muckelbauer J., Chang C., Camac D., Macor J. E.. et al. Discovery of furo [2,3-d][1,3] thiazinamines as beta amyloid cleaving enzyme-1 (BACE1) inhibitors. Bioorganic and Medicinal Chemistry Letters. 2016;26:5729–5731. doi: 10.1016/j.bmcl.2016.10.055. [DOI] [PubMed] [Google Scholar]

[ref68] Schrödinger; LLC Schrödinger Release 2021–1: LigPrep: New York, NY, 2021. [Google Scholar]

[ref69] Schrödinger; LLC Schrödinger Release 2021–1: Epik: New York, NY, 2021. [Google Scholar]

[ref70] Jorgensen W. L., Maxwell D. S., Tirado-Rives J.. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. Journal of the American chemical society. 1996;118:11225–11236. doi: 10.1021/ja9621760. [DOI] [Google Scholar]

[ref71] Schrödinger Release 2021–1: Protein Preparation Wizard; Epik, Schrödinger, LLC: New York, NY, 2021; Impact, Schrödinger, LLC, New York, NY; Prime, Schrödinger, LLC, New York, NY, 2021. Schrödinger, LLC: New York, NY, 2021. [Google Scholar]

[ref72] Tang X., Dai H., Knight E., Wu F., Li Y., Li T., Gerstein M.. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Briefings in Bioinformatics. 2024;25:bbae338. doi: 10.1093/bib/bbae338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref73] Spiegel J. O., Durrant J. D.. AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of Cheminformatics. 2020;12:1–16. doi: 10.1186/s13321-020-00429-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref74] Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L.. Quantifying the chemical beauty of drugs. Nature Chem. 2012;4:90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref75] RDKit: Open-source cheminformatics. https://www.rdkit.org (accessed: 12-Jun-2024).

[ref76] Ertl P., Schuffenhauer A.. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of Cheminformatics. 2009;1:1–11. doi: 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref77] Bajusz D., Rácz A., Héberger K.. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of Cheminformatics. 2015;7:1–13. doi: 10.1186/s13321-015-0069-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref78] Heid E., Greenman K. P., Chung Y., Li S.-C., Graff D. E., Vermeire F. H., Wu H., Green W. H., McGill C. J.. Chemprop: a machine learning package for chemical property prediction. J. Chem. Inf. Model. 2024;64:9–17. doi: 10.1021/acs.jcim.3c01250. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref79] Richard A. M., Huang R., Waidyanatha S., Shinn P., Collins B. J., Thillainadarajah I., Grulke C. M., Williams A. J., Lougee R. R., Judson R. S.. et al. The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 2021;34:189–216. doi: 10.1021/acs.chemrestox.0c00264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref80] Brown N., Fiscato M., Segler M. H., Vaucher A. C.. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 2019;59:1096–1108. doi: 10.1021/acs.jcim.8b00839. [DOI] [PubMed] [Google Scholar]

[ref81] Francoeur P. G., Masuda T., Sunseri J., Jia A., Iovanisci R. B., Snyder I., Koes D. R.. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 2020;60:4200–4215. doi: 10.1021/acs.jcim.0c00411. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref82] jglaser/binding_affinity Dataset. https://huggingface.co/datasets/jglaser/binding_affinity.

[ref83] Huang L., Xu T., Yu Y., Zhao P., Chen X., Han J., Xie Z., Li H., Zhong W., Wong K.-C.. et al. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat. Commun. 2024;15:2657. doi: 10.1038/s41467-024-46569-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref84] Schneuing A., Harris C., Du Y., Didi K., Jamasb A., Igashov I., Du W., Gomes C., Blundell T. L., Lio P.. et al. Structure-based drug design with equivariant diffusion models. Nature Computational Science. 2024;4:899–909. doi: 10.1038/s43588-024-00737-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref85] Wu K., Xia Y., Deng P., Liu R., Zhang Y., Guo H., Cui Y., Pei Q., Wu L., Xie S.. et al. TamGen: drug design with target-aware molecule generation through a chemical language model. Nat. Commun. 2024;15:9360. doi: 10.1038/s41467-024-53632-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref86] Xie J., Chen S., Lei J., Yang Y.. DiffDec: structure-aware scaffold decoration with an end-to-end diffusion model. J. Chem. Inf. Model. 2024;64:2554–2564. doi: 10.1021/acs.jcim.3c01466. [DOI] [PubMed] [Google Scholar]

[ref87] Bolognesi M. L., Cavalli A.. Multitarget drug discovery and polypharmacology. ChemMedChem. 2016;11:1190–1192. doi: 10.1002/cmdc.201600161. [DOI] [PubMed] [Google Scholar]

[ref88] Medina-Franco J. L., Giulianotti M. A., Welmaker G. S., Houghten R. A.. Shifting from the single to the multitarget paradigm in drug discovery. Drug discovery today. 2013;18:495–501. doi: 10.1016/j.drudis.2013.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref89] Yuan Y., Pei J., Lai L.. Ligbuilder v3: a multi-target de novo drug design approach. Frontiers in Chemistry. 2020;8:142. doi: 10.3389/fchem.2020.00142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref90] Gaulton A., Hersey A., Nowotka M., Bento A. P., Chambers J., Mendez D., Mutowo P., Atkinson F., Bellis L. J., Cibrián-Uhalte E.. et al. The ChEMBL database in 2017. Nucleic Acids Research. 2017;45:D945–D954. doi: 10.1093/nar/gkw1074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref91] Lee S., Jo J., Hwang S. J.. Exploring chemical space with score-based out-of-distribution generation. International Conference on Machine Learning. 2023:18872–18892. [Google Scholar]

[ref92] Guan J., Qian W. W., Peng X., Su Y., Peng J., Ma J.. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv:2303.03543 [q-bio.BM] 2023:na. doi: 10.48550/arXiv.2303.03543. [DOI] [Google Scholar]

[ref93] Kaplan J., McCandlish S., Henighan T., Brown T. B., Chess B., Child R., Gray S., Radford A., Wu J., Amodei D.. Scaling laws for neural language models. arXiv:2001.08361 [cs.LG] 2020:na. doi: 10.48550/arXiv.2001.08361. [DOI] [Google Scholar]

[ref94] Li P., Li Y., Hsieh C.-Y., Zhang S., Liu X., Liu H., Song S., Yao X.. TrimNet: learning molecular representation from triplet messages for biomedicine. Briefings in Bioinformatics. 2021;22:bbaa266. doi: 10.1093/bib/bbaa266. [DOI] [PubMed] [Google Scholar]

[ref95] Wu Z., Ramsundar B., Feinberg E. N., Gomes J., Geniesse C., Pappu A. S., Leswing K., Pande V.. MoleculeNet: a benchmark for molecular machine learning. Chemical science. 2018;9:513–530. doi: 10.1039/C7SC02664A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref96] Wang D., Li Y., Jiang J., Ding Z., Luo Z., Jiang G., Liang J., Yang D.. Tokenization matters! degrading large language models through challenging their tokenization. arXiv:2405.17067 [cs.CL] 2024:na. doi: 10.48550/arXiv.2405.17067. [DOI] [Google Scholar]

[ref97] Gummesson Svensson H., Tyrchan C., Engkvist O., Haghir Chehreghani M.. Utilizing reinforcement learning for de novo drug design. Machine Learning. 2024;113:4811–4843. doi: 10.1007/s10994-024-06519-w. [DOI] [Google Scholar]

[ref98] Hu E. J., Shen Y., Wallis P., Allen-Zhu Z., Li Y., Wang S., Wang L., Chen W.. et al. Lora: Low-rank adaptation of large language models. ICLR. 2022;1:3. [Google Scholar]

PERMALINK

MolOrgGPT: De Novo Generation via Large Language Models and Reinforcement Learning

Pablo Varas Pardo

Oscar Toledano

Guillermo Marcos-Ayuso

David Quesada

Nuria E Campillo

Abstract

Introduction

Methods

MolOrgGPT: Foundational LLMs for Molecular Generation

Model Architecture

1. Architecture Hyperparameters for MolOrgGPT Variants.

Data Set

Training Process

2. Vocabulary Composition.

1.

ADMolOrgGPTs: Guiding Molecular Generation Targeting Alzheimer’s Disease through RL

RL Algorithm

Reward Modeling

Single-Pocket Reward

Two-Pocket Reward

Computational Cost

3. Time Required for Pretraining Models.

Docking Score Benchmarking

Results

Single Pocket-Based Molecular Generation

4. Model Evaluation across Molecular Properties .

Scaffold-Constrained Pocket-Based Generation

2.

3.

4.

Two Pocket-Based Molecular Generation

5.

5. Confusion Matrix between Molecule and Scaffold Novelty.

6.

Docking Benchmark

7.

Discussion

Analysis of Baseline Training Data Sets

6. Comparison of Molecular Property Distributions between the Mixed Baseline Dataset and ZINC20 .

Model Comparison According to Size

8.

9.

Less Successful Attempts

Molecular Property Prediction

Tokenization

Multiple Properties Optimization

PPO Hyperparameters Optimization

Future Lines

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases