Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2026 Jan 19;27(1):bbaf727. doi: 10.1093/bib/bbaf727

An effective fragment-based dual conditional diffusion framework for molecular generation

Haotian Chen 1,2,3, Yiting Shen 4, Jichun Li 5, Weizhong Zhao 6,7,8,
PMCID: PMC12814976  PMID: 41554053

Abstract

Fragment-based molecular generation has emerged as a promising paradigm in structure-based drug design (SBDD), deriving effective compounds with advanced properties, including chemical validity, synthetic feasibility, pharmacological relevance, etc. However, existing approaches often struggle with generating molecules which can both conform to 3D structural constraints and retain chemical plausibility. This is largely due to the fact that prior works often treat scaffolds and R-groups of molecules indiscriminately, overlooking the distinct semantic roles played by scaffolds and R-groups. Specifically, the scaffold serves as the rigid structural backbone that determines the global geometric topology and binding pose, whereas R-groups act as functional substituents responsible for fine-tuning local physicochemical interactions. Therefore, in this work, we propose fragment-based dual conditional diffusion (FDC-Diff), a novel dual conditional diffusion framework that integrates chemical priors and structural cues for fragment-based molecular generation. Unlike traditional de novo methods that generate atoms sequentially, FDC-Diff decomposes the molecule generation process into two semantically complementary stages. Given the protein pocket and an initial fragment, in the first stage, a spatially constrained scaffold is constructed to capture the global molecular topology. In the second stage, R-groups onto the obtained scaffold are elaborated to capture local semantics to further refine molecular properties. To ensure synthetic accessibility, initial fragments and scaffold-modification hierarchy are derived from curated reaction rules, and a physical-chemistry-inspired refinement step is applied to optimize final conformations. Experimental results on multiple SBDD benchmarks demonstrate that FDC-Diff achieves state-of-the-art performance in terms of comprehensive evaluations. Furthermore, our model excels at producing chemically valid, spatially compatible, and pharmacologically relevant molecules, suggesting its potential as a feasible tool for fragment-based drug design.

Keywords: fragment-based molecular generation, structure-based drug design, conditional diffusion model

Introduction

Recent years have witnessed the rapid development of artificial intelligence-assisted drug design (AIDD) technology, which provides powerful tools for discovery and optimization of small molecule drugs [1, 2]. Within the generative strategy-based models of AIDD, de novo molecular design [3–7] has become a popular paradigm. Based on the representation granularity for compounds, de novo molecular design methods can be roughly classified into two families: atom-based and fragment-based [7]. Generally, atom-based de novo design generates molecules directly at the level of atoms and bonds, incrementally assembling structures while enforcing chemical constraints (e.g. valence, ring closure, and stereochemistry) and determining connectivity on the fly. In contrast, fragment-based de novo design constructs molecules by linking or extending chemically meaningful fragments, which typically starts from known or validated active fragments and generates remaining components reasonably by integrating knowledge from structural biology and computational analysis [8–11]. Since fragment-based methods are more aligned with the actual drug development process [8–11], this study focuses on the molecular generation from this line of de novo molecular design.

In recent years, researchers have proposed several fragment-based drug design methods based on different strategies [2, 12–15]. However, existing fragment-based methods [16–18] have the following limitations: (i) 2D representations fail to capture 3D conformations and pocket context, posing challenges for direct structure-based design [18, 19]; (i) most methods adopt bottom-up assembly (i.e. constructing molecules fragment by fragment), which is prone to error accumulation, showing limited global control over functional regions and synthetic accessibility (SA). This may ultimately lead to the collapse of the molecular structure, preventing the formation of a stable and reasonable 3D conformation [19, 20]. This issue can be illustrated by examples shown in Fig. 1.

Figure 1.

Alt text: Visualization comparing molecules generated by different methods in protein binding pockets. FDC-Diff produces more drug-like, geometrically consistent structures than the other methods.

Visualization of molecules generated by different methods. (i) Atom-level methods (PMDM, Pocket2Mol): These methods generally exhibit a tendency to generate molecules with a low number of heavy atoms and simple structures, resulting in generated molecules lacking sufficient pharmacophoric features to maintain biological activity. Furthermore, in an attempt to fill larger protein pocket cavities, the Pocket2Mol model tends to generate large polycyclic systems, which lack physical realism. (ii) Fragment-level methods: FLAG tends to stack large, rigid fragments, facing significant challenges in geometric refinement. Errors during each fragment generation step accumulate, ultimately leading to the collapse of the molecular structure. FDC-Diff generates drug-like molecules with appropriate heavy atom counts. By ensuring coherent global topology alongside precise local geometry, it significantly mitigates the risk of generating physically invalid structures.

To address these structural failures, we argue that generative models must align with the intrinsic hierarchical nature of medicinal chemistry. From a structure–function perspective, a drug-like molecule is not a uniform collection of atoms but a hierarchical entity consisting of two distinct components. A molecule typically contains a relatively stable core region that determines the overall topology, provides a connective framework to integrate various functional groups or fragments, and governs target-binding characteristics, as well as a peripheral modification region responsible for regulating the physicochemical and pharmacological properties of the molecule [21, 22]. These two components differ significantly in structural complexity, functional roles, and synthetic strategies, as well as in the ways computational models handle them. The biases and error-prone assembly in existing methods often stem from treating core and peripheral elements uniformly, leading to suboptimal exploration of novel fragments (especially in peripheral regions) and insufficient global stability enforced by the core. For example, as shown in Fig. 1, due to the lack of essential chemical prior knowledge, some models tend to generate invalid structures. Effectively distinguishing between these core and peripheral regions within molecular structures is crucial for further improving the precision and controllability of molecular generation processes [23–27].

To illustrate this distinction, Fig. 2a shows the evolution of the quinolone lineage, for which the bicyclic heterocycle works as the core structure. More specifically, early non-fluorinated quinolones (e.g. nalidixic acid) were primarily used to treat Gram-negative urinary tract infections due to their limited spectrum and poor pharmacokinetics. With the introduction of a C-6 fluorine atom and a C-7 piperazinyl substituent, significant improvements were obtained in both the drug’s antibacterial activity and pharmacokinetic properties, resulting in the development of more effective agents (e.g. norfloxacin and ciprofloxacin). Later respiratory quinolones (e.g. levofloxacin and gatifloxacin) and their analogs (e.g. moxifloxacin and gemifloxacin) further extended the coverage to Gram-positive bacteria and anaerobes, while also improving the drug safety profiles [28–30]. In this context, the bicyclic heterocycle scaffold is essential for defining the drug class, establishing the fundamental mechanism of action, and providing the molecular framework for antibacterial activity. In contrast to the scaffold’s structural and functional role, Fig. 2b presents the Inline graphic-methylene-Inline graphic-butyrolactone (MBL) pharmacophore and its examples from various natural products, emphasizing the structural characteristics of MBL as a substituent. As a Michael acceptor, MBL forms covalent bonds with nucleophilic residues, which modulate the activity of specific biological targets. MBL is commonly found in bioactive natural products such as helenalin, arglabin, and eriolangin [31]. Within this framework, the R-group is pivotal in determining the drug’s potency, selectivity, and covalent binding ability. Generally speaking, examples in Fig. 2 highlight the distinct roles of the scaffold and R-group in molecular design, i.e. the scaffold preserves the structural integrity while the R-group enhances the drug’s properties, such as efficacy, targeting ability, and safety.

Figure 2.

Alt text: Illustration of distinct roles of scaffold and R-groups in molecular design. (a) Quinolone drug structures sharing a common core with varying substituents; (b) examples of molecules containing an alpha-methylene-gamma-butyrolactone moiety.

Illustration of distinct roles of scaffold and R-groups in molecular design. (a) Quinolone-based drugs with the bicyclic heterocycle core structure. The substituents at position 3 (COOH) and position 4 (=O) are critical for the drug’s binding to enzymes, making them common features of Quinolone-based drugs. The remaining substituents are related to the drug’s antimicrobial properties, side effects, and metabolic characteristics. (b) Natural products and active molecules containing the Inline graphic-methylene-Inline graphic-butyrolactone moiety. The Inline graphic-methylene-Inline graphic-butyrolactone moiety can form covalent bonds with biomacromolecules through 1,4-conjugate addition, thereby influencing the biological function of these molecules. This moiety is regarded as a pharmacophore with a covalent binding mode.

Based on the observations mentioned above, in this study, we propose a novel fragment-based dual conditional diffusion model, termed FDC-Diff, for structure-based drug design (SBDD). This framework decomposes the molecular generation process into two synergistic substages based on preconstructed molecular fragments, which are responsible for capturing the global and local semantic information of molecules, respectively. First, we preprocess existing small molecules to obtain chemically meaningful fragments generated based on validated reaction rules and decompose existing bioactive molecules based on specific chemical reaction templates. Then, in the first generation stage, FDC-Diff focuses on constructing a 3D structure that satisfies the spatial constraints of the protein binding pocket. This structure undertakes the global topology and spatial anchoring of the molecule, which will determine the overall shape, binding mode, and potential growth sites of key pharmacophoric groups. Next, in the second generation stage, FDC-Diff learns to continue molecular growth on the obtained scaffold by refining local chemical environments. This stage is designed to optimize specific local properties (e.g. hydrophilicity and solubility) enabling functional modification and fine-tuning of physicochemical attributes. Distinct from holistic generation approaches, this stepwise progression allows FDC-Diff to balance macro-level geometric constraints with micro-level chemical validity. While previous methods often struggle to satisfy both criteria simultaneously, our framework’s decoupled nature allows for targeted optimization at each scale. Below, we discuss the specific advantages of this architecture compared with existing methods.

The key distinction from previous methods lies in FDC-Diff’s innovative integration of fragment-based preprocessing with a dual-stage conditional diffusion paradigm and a graph neural network (GNN)-based bond reconstruction mechanism. Specifically, unlike Pocket2Mol [32], which relies on atom-by-atom autoregressive sampling that can suffer from inefficiency and invalid structures due to sequential dependencies, FDC-Diff leverages cross-stage conditional diffusion to generate fragments in a more parallelized manner, enhancing computational efficiency, molecular validity, and adherence to protein pocket constraints. In contrast to PMDM [33], an atom-level diffusion model that starts from random noise and may struggle with capturing chemically meaningful intermediates, FDC-Diff initiates diffusion at the fragment level using preconstructed, reaction-rule-validated fragments, then incorporates global scaffold construction for spatial anchoring followed by local refinement for physicochemical optimization and GNN-driven bond assembly to ensure precise, stable molecular topologies. Compared with FLAG [18], a fragment-based autoregressive method, which generates the molecule in a single sequential stage by retrieving fragments from a fragment library for assembly, FDC-Diff decomposes the process into synergistic global and local stages, enabling improved control over the generation stage and reducing the risk of generating meaningless structures.

In conclusion, our approach offers the following key contributions:

  • Semantic Disentanglement of Geometry and Properties FDC-Diff strategically decouples the generation process into two cooperative subtasks. We prioritize the scaffold to satisfy global geometric constraints (spatial anchoring) and subsequently elaborate R-groups to tune local physicochemical properties. This hierarchical distinctness ensures that generated molecules possess both precise 3D pocket adaptability and reasonable chemical distributions.

  • Reaction-Informed Dataset and FBDD-Guided Initialization To bridge the gap between generative AI and practical synthesis, we construct the scaffold-R-group dataset via reaction-based rules and employ BRICS for chemically valid fragmentation. Crucially, we utilize fragments adhering to the Three Principles of Fragment-Based Drug Design (FBDD) as the specific starting points for generation. By embedding these chemical priors directly into the process, FDC-Diff incorporates a strong inductive bias, ensuring that both the building blocks and the resulting compounds adhere to synthetic validity and structural integrity.

  • State-of-the-Art Performance and Practical Utility Extensive empirical evaluations across multiple benchmark tasks demonstrate that FDC-Diff significantly surpasses existing baseline methods. FDC-Diff achieves superior results on essential evaluation metrics, including binding affinity, quantitative estimation of drug-likeness (QED), and SA. These findings not only verify the effectiveness of the proposed framework but also highlight its substantial potential for practical deployment in fragment-based drug discovery.

Related work

In this section, we briefly review related work to this study, including several representative methods of atom-based and fragment-based de novo molecular design methods.

Atom-based molecular design

For atom-based de novo molecular design, molecules are typically represented as SMILES strings or molecular graphs, and then generated or constructed by iteratively adding atoms and bonds [7, 15, 32–37]. For example, Peng et al. [32] proposed Pocket2Mol, in which a pocket-conditioned, Inline graphic-equivariant autoregressive scheme is employed for molecular generation. Specifically, Pocket2Mol predicts the new atom’s relative coordinates using a Gaussian mixture model and infers element type and bond order through equivariant attention. Pocket2Mol effectively couples 3D geometry with bonding information, thereby reducing the accumulation of long-range errors. Following the same autoregressive paradigm for molecular generation, GraphBP employs a 3D GNN to encode both the context of the pocket and placed atoms. It uses a local reference atom to define a spherical coordinate system and employs a flow-based head for sequential prediction of the next atom’s type and Inline graphic coordinates. This design inherently ensures Inline graphic equivariance while guaranteeing chemically valid atom placement. In addition, PMDM [33] adopts a pocket-conditioned dual-diffusion formulation with Inline graphic-equivariant dynamics. By combining local and global encoders with cross-attention mechanisms, PMDM effectively captures both the semantic and spatial context of proteins, enabling the generation of highly potential and drug-like molecules.

However, there are some drawbacks in existing atom-based molecular design methods: (i). Some approaches [16, 34, 37] still rely on 2D graphs or treat molecules independently of protein pockets, limiting their ability to design receptor-specific, high-affinity molecules; (ii). Some methods [32, 33] fail to effectively integrate chemical prior knowledge, leading to unstable generative quality. These limitations typically manifest in two distinct ways: first, producing molecules with fewer heavy atoms, which results in overly simplistic structures that lack pharmacological value; second, generating chemically invalid topologies (e.g. distorted fused ring systems and unrealistic geometries), which might lead to physically unreasonable structures. These issues are illustrated in Fig. 1.

Fragment-based molecular design

Fragment-based molecular design treats chemically meaningful fragments as building blocks and assembles molecules by linking or growing fragments under explicit rules, thereby improving chemical validity [15]. Following the coarse-to-fine paradigm, the Junction Tree VAE (JT-VAE) [38] ensures graph-level validity through a two-stage generative process. First, it creates a tree-structured scaffold composed of valid substructures, such as rings and bonds. Then, it assembles these substructures into a complete molecular graph by using a message-passing mechanism. This coarse-to-fine generation process ensures the chemical validity of the molecules at every step, and the model efficiently handles large molecular graphs by leveraging substructures. Unlike the treatment of JT-VAE, DeepFrag [16] takes the fragment-based lead optimization as a discriminative fragment-completion task. Specifically, DeepFrag first analyzes the 3D voxelized representation of the receptor–ligand complex, and then utilizes a deep convolutional neural network to predict the molecular fragment that should be added after removing a fragment, thereby completing the ligand structure. This approach allows for rapid and efficient identification of suitable fragments, which can be used to improve the binding affinity of the ligand to the target receptor. Zhang et al. [18] proposed FLAG, in which fragment priors are introduced into SBDD and ligands are generated fragment by fragment directly in 3D under pocket conditioning. The advantages of FLAG lie in that it can not only ensure bond lengths and angles conform to cheminformatics rules, but also can improve both efficiency and accuracy.

Despite the success of fragment-based generation methods in certain applications, they still have the following limitations. First, some methods [38] are designed on 2D representations and overlook 3D conformations and pocket context, which complicate direct structure-based design [18]. To address this limitation, FDC-Diff introduces a fragment-driven diffusion framework that models the 3D structure based on Cartesian coordinates. This approach allows for better consideration of 3D conformations and target context, overcoming the limitations of traditional 2D representations. Second, some methods [18] employ autoregressive generation techniques without considering the prior constraints of the scaffold structure, which can lead to a more random and less controllable process, generating unreasonable structures (e.g. polycyclic molecules, as shown in Fig. 1), resulting in chemical instability and irrationality [19, 20]. In contrast, FDC-Diff employs a conditional diffusion model to separately model the global scaffold structure and local functional group information, enhancing the controllability of the generation process. It ensures the stability and rationality of the global structure, thereby reducing the risk of generating unreasonable structures.

Preliminaries

In order to describe the proposed framework conveniently, in this section, we provide a brief introduction to diffusion model and the definition of the conditional molecular generation.

Brief introduction to diffusion model

The diffusion model [39] consists of two Markov chains: the forward (diffusion) process and the reverse (denoising) process. The diffusion process progressively adds Gaussian noise to the data following a variance-preserving schedule, while the reverse process refines the noisy data, eventually recovering the original data by removing the noise. The key goal of diffusion model is to learn the reverse process by using a parameterized neural network.

The forward diffusion step can be formulated as

graphic file with name DmEquation1.gif (1)

where Inline graphic denotes the noisy representation at time step Inline graphic, Inline graphic is the noise variance determined by the schedule, and Inline graphic is the identity matrix. The noise variance typically follows a linear schedule, with Inline graphic increasing from Inline graphic to Inline graphic across a total of Inline graphic diffusion steps.

The reverse process is parameterized by a neural network that estimates the noise:

graphic file with name DmEquation2.gif (2)

where Inline graphic and Inline graphic denote the mean and covariance predicted by the model, respectively. The objective of training procedure is to minimize the following variational bound:

graphic file with name DmEquation3.gif (3)

where Inline graphic is the injected Gaussian noise and Inline graphic is the noise predicted by the parameterized model.

Definition of conditional molecular generation

Conditional molecular generation is a task to design molecules with desired structures and functions based on specific conditions or constraints. It leverages input conditions, such as physicochemical properties, target binding affinity, or toxicity of molecules, to guide the generation process. By effectively controlling this process, conditional molecular generation is able to design molecules that meet specific functional requirements.

In this study, we extend the existing framework by incorporating a conditional diffusion to handle the molecular generation process. The diffusion model is formulated as a gradual denoising process, where external conditions guide the generation of molecules at each step [35]. Specifically, in the task of 3D molecular generation, diffusion models must handle both discrete atom types and continuous atomic coordinates. Molecules are represented as Inline graphic, where Inline graphic denotes the one-hot encoded matrix of atom types for Inline graphic atoms over Inline graphic distinct types, and Inline graphic represents the corresponding Cartesian coordinates of the atoms. Additionally, we incorporate the information about the protein pocket which is denoted as Inline graphic, as well as molecular fragments Inline graphic with anchor atoms Inline graphic that identify potential growth sites for the fragment. The scaffold information is represented as Inline graphic and the R-group information is represented as Inline graphic. The conditional information is represented as Inline graphic, which in the scaffold stage is denoted as Inline graphic, and in the R-group stage is denoted as Inline graphic. The goal is to learn the conditional distribution:

graphic file with name DmEquation4.gif (4)

By doing so, we generate molecules that not only satisfy specific functional and structural constraints but also allow for optimization of molecular properties based on the given conditions. An overview of the conditional diffusion process is presented in Fig. 3.

Figure 3.

Alt text: Conditional diffusion process: forward adding noise to a molecule and reverse denoising to generate a valid molecule.

Illustration of the conditional generation process. The forward process gradually adds noise to the molecular structure, while the reverse process denoises it to generate chemically valid molecules.

Method

In this section, we elaborate on the proposed molecular generation framework. The overall architecture is shown in Fig. 4. The framework adopts a dual conditional diffusion model, and each stage models different substructures in the 3D molecular structure generation process. In the first stage, the initial skeleton conformation is generated based on the input molecular fragment, growth sites, and protein pockets. In the second stage, the spatial structure of the skeleton is further expanded and refined to generate the complete molecular conformation. Finally, the generated atomic coordinates and atom types are fed into a bond construction tool to ultimately generate a complete 3D molecular structure that closely matches the target protein pocket.

Figure 4.

Alt text: Overview diagram of a dual-stage conditional generation framework: scaffold generation, R-group completion, and bond construction to form the final molecule.

Overview of the proposed dual conditional generation framework. The scaffold stage generates a molecular skeleton from a fragment and anchor atoms, and the R-group stage completes the molecule based on the derived scaffold. Finally, the generated atomic coordinates and atom types are input into the YuelBond framework for bond construction, deriving a complete molecule.

Scaffold generation stage

The molecular scaffold typically constitutes the structural skeleton of a compound, determining its overall geometric layout and fundamental chemical functionality. In the task of molecular generation, the rationality of the scaffold directly impacts the extensibility of the structure as well as the pharmacological relevance and SA of the resulting molecule. Therefore, constructing a reasonable scaffold serves as the critical first stage in the molecular generation pipeline.

For the scaffold generation stage, it requires careful consideration of several key aspects: (i) generating a spatially continuous and chemically reasonable distribution of atomic coordinates to form a well-defined structural framework; (ii) maintaining spatial regularities, such as plausible bond lengths and bond angles, to facilitate subsequent structural reconstruction and refinement; and (iii) incorporating geometric constraints from the protein pocket during the generation process to ensure spatial alignment and functional compatibility with the target binding sites. To systematically achieve the effective scaffold construction and functional guidance, we decompose this stage into multiple steps, as detailed below.

Forward process

In the diffusion model for molecular generation, the forward noise injection process aims to gradually transform the initial structure into a noisy state. Initially, the atomic coordinates of the fragment Inline graphic remain fixed. Gaussian noise is then applied to the atoms designated for expansion, denoted as Inline graphic, perturbing their spatial positions and types. As the noise is progressively added, the relative positions and types of the atoms change, gradually blurring the originally defined molecular structure and ultimately forming a starting structure with appropriate randomness. In order to present the following formal descriptions conveniently, we use the symbol Inline graphic to replace Inline graphic in the following subsection.

Denoising process

In the denoising process, the anchor atoms Inline graphic serve as spatial growth sites. These atoms guide the placement of new scaffold atoms around them, ensuring both geometric alignment and chemical bonding constraints are satisfied. The anchors provide structural cues that reasonably guide the valid chemical expansion of the scaffold.

To model the molecular expansion process, we use a learnable function Inline graphic, which represents the diffusion dynamics. This function is implemented as a modified E(3)-equivariant Graph Neural Network (EGNN) [36]. At each time step Inline graphic, the input to the network consists of the noisy scaffold Inline graphic and the fixed context Inline graphic. These components are represented as a fully connected graph, where each node is characterized by its coordinate Inline graphic and feature vector Inline graphic.

The model then predicts the noise Inline graphic, which consists of the coordinate and feature components. In order to make sure that the function Inline graphic is invariant to translations, the initial coordinates (Inline graphic) are subtracted from the coordinate component of the predicted noise following Hoogeboom et al. [35]. Specifically, the predicted noise is given by:

graphic file with name DmEquation5.gif (5)

The EGNN mainly consists of a sequence of E(3)-equivariant Graph Convolution Layers (EGCL), which is defined as:

graphic file with name DmEquation6.gif (6)
graphic file with name DmEquation7.gif (7)
graphic file with name DmEquation8.gif (8)
graphic file with name DmEquation9.gif (9)

where Inline graphic. While a scalar distance encodes basic geometric information, it carries limited representational capacity and tends to be smoothed out by deep linear transformations during message passing. To enhance the model’s sensitivity to spatial details, we apply radial basis function (RBF) expansion to the interatomic distance Inline graphic by using a set of Gaussian basis functions:

graphic file with name DmEquation10.gif (10)

Here: Inline graphic are fixed or learnable Gaussian centers (typically uniformly distributed over the interval Inline graphic); Inline graphic is the shared bandwidth controlling kernel spread; the RBF feature vector Inline graphic serves as a high-dimensional edge representation in the GNN. Each element Inline graphic is a value computed by the RBF, representing the response of the distance Inline graphic between node Inline graphic and node Inline graphic at the Inline graphicth Gaussian basis function.

Under this framework, the RBF expansion acts as a differentiable binning over the distance domain, allowing the network to explicitly distinguish interaction frequency bands across short-range and long-range neighbors. This helps preserve fine-grained geometric variations and long-range non-covalent interactions in protein–ligand binding pockets.

The functions Inline graphic, Inline graphic, and Inline graphic are all parameterized by neural networks.

Message update  Inline graphic: The input is the concatenation of node embeddings Inline graphic and Inline graphic, the squared distance Inline graphic and the RBF Inline graphic. Two fully connected layers with SiLU activation are conducted on the input to derive the output denoted by Inline graphic. The formal description is defined as follows.

graphic file with name DmEquation11.gif (11)

Feature update  Inline graphic: The input is the concatenation of node embedding Inline graphic and its aggregated message Inline graphic. After the transformation defined as in Equation (12), the output is obtained as the updated node embedding.

graphic file with name DmEquation12.gif (12)

Coordinate update  Inline graphic: The input is the same as Inline graphic, and the output is a scalar value defined as follows.

graphic file with name DmEquation13.gif (13)

The equivariance of the convolutional layers with respect to the Euclidean group Inline graphic is strictly preserved by design. Specifically, both the message passing function and the feature update function rely solely on scalar features and pairwise Euclidean distances, which are invariant under Inline graphic transformations (including rotation and translation). Meanwhile, the coordinate update function is linear with respect to the relative positions of nodes, ensuring that the overall architecture remains equivariant under spatial transformations. In addition, the RBF distance encoding is also invariant under Inline graphic transformations. Let the coordinates of two nodes Inline graphic and Inline graphic be Inline graphic and Inline graphic, respectively. Under an Inline graphic transformation consisting of a rotation Inline graphic and a translation vector Inline graphic, the coordinate transformation is defined as:

graphic file with name DmEquation14.gif (14)

The pairwise Euclidean distance is preserved:

graphic file with name DmEquation15.gif (15)

since rotation preserves the norm. The Gaussian RBF also remains unchanged under transformation:

graphic file with name DmEquation16.gif (16)

Therefore, RBF features, used as scalar inputs in message and feature update functions, do not break the Inline graphic-equivariant structure of the network. Instead, they enhance its ability to encode fine-grained geometric information while strictly preserving equivariance.

After applying multiple EGCL layers, the updated graph consists of new coordinates Inline graphic and new features Inline graphic. Since we are only interested in the predicted noise, we only take the tuple Inline graphic as the final output of the EGNN [17].

R-group generation stage

In contrast to the scaffold which serves as the structural skeleton of a molecule and defines its global geometry and core chemical features, R-groups primarily play the role in functional modification, spatial complementarity, and enhancing target specificity. These structures typically exhibit higher local complexity and structural diversity, significantly influencing molecular properties such as hydrophobicity, polarity, charge distribution, and geometric compatibility with the protein binding pocket. Therefore, incorporating R-groups is not merely a spatial extension of the scaffold but a critical step toward optimizing the molecule’s functional and pharmacological properties.

The R-group generation stage emphasizes the fine-grained modeling of local structural extensions, which is characterized by the following key aspects. First, R-group generation is conditioned on the prederived scaffold, where new atoms are added at positions determined by the model, based on spatial and chemical considerations. Second, to ensure spatial plausibility, the generated R-groups must align properly with the scaffold, avoid steric clashes, and maintain chemically reasonable growth directions. Finally, although explicit bond types and molecular graph topology are not modeled, the use of spatial guidance and geometric constraints allows the model to implicitly learn and preserve chemical validity at the coordinate level. To systematically achieve the generation and integration of R-groups, we further decompose this stage into the following steps.

Forward noise process of R-groups

In the forward noise injection process, noise is added to the atoms of the R-group. Inspired by previous methods [20], a virtual atom padding mechanism is introduced to support variable-length structure generation. Since most R-groups contain fewer than 10 atoms, each ligand is padded to a maximum of 10 atoms.

Reverse denoising of R-groups

During the reverse denoising step, the protein pocket and scaffold provide structural guidance for generating chemically valid R-groups. An E(3)-equivariant GNN predicts the atom coordinates and types. A polynomial noise schedule ensures stable generation, aligning the R-group with the scaffold and protein pocket constraints.

Bond construction

In order to construct the chemical bonds between atoms, we utilize YuelBond [40], a multimodal framework based on GNNs, to accurately predict bond orders within molecules. Traditional methods, such as OpenBabel [41], typically rely on geometric and valence rules to infer molecular bond structures from atomic coordinates and features. However, these approaches often face limitations in practical applications, particularly when molecular geometries are distorted or imprecise, making it challenging to accurately reconstruct chemical bonds. As a result, predictions for molecular properties, such as SA and QED, tend to be less accurate, which are critical for drug design and other applications.

In this study, YuelBond is employed to effectively address these challenges by employing a graph-based learning approach that takes into account not only atomic connectivity but also local chemical environments and interatomic distances. Specifically, a foundational molecular graph is constructed by treating atoms as nodes and encoding their elemental types as node features. If the distance between two atoms is Inline graphic Å, which is within the typical range for covalent bonds, an edge is formed between them, representing a potential chemical bond. The interatomic distance is then embedded as an initial edge feature. Next, an edge-focused iterative refinement process is employed to fine-tune these bond candidates, progressively refining them into meaningful bond representations. Each iteration involves three key steps: first, contextual information is constructed by combining the features of each atom with those of adjacent edges; second, edge features are updated by incorporating interatomic distances, capturing bond-type patterns while tolerating geometric distortions in noisy structures;finally, atomic node features are updated by aggregating neighboring information, ensuring each optimization step leverages the refined local environment. After these steps, the optimized edge representations are projected onto four main bond types (single bond, double bond, aromatic bond, andtriple bond) using a linear layer, and a softmax function is applied to generate the probability distribution for each category [40]. This enables YuelBond to make robust predictions even in the presence of noise or distortions in molecular geometries, providing more accurate molecular bonding representations, and improving the overall molecular quality accordingly.

Decoupled training strategy

Instead of sharing parameters across both stages, we independently train the scaffold and R-group generators. Although parameter sharing is common in diffusion-based molecular generation, our experiments demonstrate that the parameter sharing causes the scaffold loss to dominate, hindering convergence, and degrading the R-group generation quality.

Thereafter, we adopt a decoupled training strategy where the two networks are trained separately with their own parameters and training data. Each model independently optimizes its reverse diffusion process, enabling balanced learning and stable convergence. We further apply gradient clipping to enhance numerical stability and prevent exploding gradients. The generation objective is to minimize the L2 loss between the true noise and the predicted noise:

graphic file with name DmEquation17.gif (17)

Experiments

In this section, we first introduce the dataset and preprocessing procedure (including fragment acquisition and the cutting process of scaffold-R-group pairs), followed by a detailed description of experimental setup and result analysis.

Dataset and preprocessing

Dataset collection

In experiments, we collected samples from the CrossDocked2020 dataset [42], which comprises Inline graphic22.5 million protein–ligand complexes. For each complex, the ligand is associated with multiple receptor pockets curated from the Protein Data Bank (PDB), and docking is performed using the smina tool through the Pocketome pipeline. For each pocket, its corresponding ligands are docked against all candidate receptors. In order to ensure evaluation consistency with prior work [43], we collected samples with binding poses exhibiting a root-mean-square deviation (RMSD) <1 Å, reflecting high-quality docking conformations. To reduce redundancy and enhance data diversity, we further applied MMseqs2-based [44] clustering using a 30% sequence identity threshold. The final dataset includes 100 000 protein–ligand pairs for training and additional 100 samples reserved for testing.

Data preprocessing

In order to effectively train our framework, we use a BRICS-based molecular fragmentation strategy [45] (as shown in Fig. 5) to decompose each compound into a set of smaller structural fragments as starting points for molecule generation. Furthermore, to ensure the subsequent fragments can efficiently and controllably integrate with the generated portion, forming a complete scaffold, we select the molecular cleavage sites as growth anchors. At the same time, to ensure the chemical plausibility and structural applicability of these fragments, we further apply the “rule of three” principle [11–13] to filter the fragments, ensuring that each initial fragment possesses desirable drug-like properties and is suitable for structure-driven molecular design. Specifically, each fragment meets the following conditions.

Figure 5.

Alt text: BRICS fragmentation: splitting a molecule into fragments.

Illustration of BRICS-based molecular fragmentation.

  • The molecular weight is <300 Da;

  • The lipophilicity, measured by either logP (octanol–water partition coefficient) or logD at pH 7.4, is Inline graphic3;

  • The number of hydrogen bond donors (e.g. N–H, O–H) and acceptors (e.g. N, O atoms) is no more than 3; the topological polar surface area (TPSA) is Inline graphic 60 ÅInline graphic; and the fragment contains no rotatable bonds.

Furthermore, to guarantee the diversity and synthesizability of generated molecules, we used LibINVENT [46] to slice each molecule into the pair of molecular scaffold and R-group. Specifically, LibINVENT is conducted based on 37 experimentally verified reaction SMIRKS templates (as shown in Fig. 6), covering a wide range of typical organic transformations and ensuring that each bond breakage corresponds to a chemically feasible reaction pathway. By applying LibINVENT, we obtain a training set with 76 108 tuples of molecular scaffolds, pockets, and R-groups, and a testing set with 43 tuples of molecular scaffolds, pockets, and R-groups, respectively.

Figure 6.

Alt text: LibINVENT decomposition: splitting a molecule into core and substituents using reaction rules.

Illustration of the reaction-based molecular decomposition strategy used in LibINVENT.

We then applied the BRICS rule to further slice the molecular scaffolds to filter out chemically unreasonable molecules, ensuring the resulting fragments with better chemical plausibility and synthetic feasibility.

After the whole preprocessing procedure, we obtained a training set, a validation set, and a testing set with 50 403 samples, 12 601 samples, and 20 samples, respectively.

Experimental setup

Hyper-parameter setting

In experiments, the proposed framework was implemented with 6 layers of EGNN, a hidden dimension of 128, a learning rate of 0.0002, a batch size of 32, and a total of 2000 training iterations.

Baselines

For performance comparison, we select the following three representative models as baselines.

  • Pocket2Mol is an E(3)-equivariant generative network, which progressively predicts and adds new atoms based on the existing environment to derive a complete molecule.

  • FLAG is a fragment-based ligand generation framework that generates 3D molecules segment by segment, and further refines the molecular geometry based on predicted rotational angles and structural optimization.

  • PMDM is a conditional generative model that combines dual equivariant GNNs, capable of simultaneously capturing both local and global molecular dynamics, thereby efficiently generating drug-like molecules with reasonable binding affinity.

For a fair comparison, we use the same 20 protein pockets for the three baselines and our method for molecular generation. Due to the differences in methodology, Pocket2Mol and PMDM generate molecules from the atom-level, without involving the procedure of fragment processing. As FLAG, the core idea is based on fragment-based stepwise assembly. Although it also utilizes fragments, the generated molecules are assembled from a vocabulary built from the entire training set. In contrast, our method FDC-Diff is based on diffusion model, which focuses more on the growth process of initial fragments, which meet chemical plausibility and structural practicality.

Evaluation metrics

To systematically evaluate the properties and quality of generated molecules, we adopt several evaluation metrics which were widely used in prior studies [47]. These metrics cover diverse aspects, including binding performance, drug-likeness, and structural diversity.

  • Docking Affinity (Vina Score) estimates the binding strength between the generated molecule and the target protein pocket using QVina, based on predicted molecular conformations and pocket interactions.

  • High Affinity measures the proportion of generated molecules that exhibit higher binding affinity to the protein pockets compared with the reference ligands in the testing set.

  • Drug-likeness Score (QED) evaluates the likelihood that a generated molecule serves as a drug candidate, with higher scores indicating stronger drug-likeness.

  • SA reflects the ease of chemical synthesis, with normalized scores in the range [0,1]; higher values indicate easier synthesis.

  • Lipophilicity (LogP) [48] represents the octanol–water partition coefficient of a molecule; the recommended range for drug-like compounds lies between −0.4 and 5.6.

  • Lipinski Rule Compliance (Lip.) [49] counts how many of Lipinski’s “Rule of Five” criteria are satisfied, commonly used to assess oral drug-likeness.

  • Similarity to Training Set (Sim.) computes the Tanimoto similarity between each generated molecule and its most similar counterpart in the training set, indicating the novelty relative to known compounds.

  • Structural Diversity (Div.) quantifies the variation among generated molecules for each protein pocket, defined as 1 minus the average pairwise Tanimoto similarity, with higher values implying broader chemical exploration.

Results and analysis

In this subsection, we comprehensively evaluate the performance of our proposed method and multiple baseline methods in the molecular generation task. Specifically, we analyze from two perspectives: the average performance of common generation indicators, and the structural characteristics of the generated molecules. Furthermore, we conduct ablation studies to investigate the contribution of main components in our proposed framework. The experimental results and analysis will be elaborated in detail as follows.

Evaluation on general metrics

For each target protein in the test set (consisting of a total of 20 target proteins), we generate 100 molecules for performance evaluation (with a total of 2000 molecules). Note that the sizes of generated molecules are sampled from the size distribution of the training set. The overall results of FDC-Diff and the baseline models are shown in Table 1. The results in Table 1 demonstrate that our model provides better performance on almost all evaluation metrics (except for similarity with the training set), compared to the baseline methods.

Table 1.

Comparing the molecular properties of the test set and the generated molecules by different methods

Methods Vina Score High Affi. QED SA LogP Lip. Sim. Div.
(Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic)
Testset −8.222 Inline graphic 0.38 0.492 Inline graphic0.23 0.736 Inline graphic0.15 3.012 Inline graphic2.16 3.012 Inline graphic2.16
PMDM −8.062 Inline graphic2.96 0.421 Inline graphic0.24 0.576 Inline graphic0.16 0.627 Inline graphic0.15 2.212 Inline graphic2.14 4.770 Inline graphic0.47 0.222  Inline graphic0.01 0.895 Inline graphic0.01
FLAG −7.500 Inline graphic2.50 0.413 Inline graphic0.31 0.467 Inline graphic0.19 0.672 Inline graphic0.15 2.119 Inline graphic2.21 4.510 Inline graphic0.82 0.250 Inline graphic0.07 0.896 Inline graphic0.01
Pocket2 Mol −8.554 Inline graphic3.00 0.401 Inline graphic0.38 0.558 Inline graphic0.18 0.710 Inline graphic0.14 2.221 Inline graphic2.32 4.693 Inline graphic0.60 0.248 Inline graphic0.06 0.891 Inline graphic0.01
FDC-Diff −8.730  Inline graphic1.92 0.563  Inline graphic0.34 0.643  Inline graphic0.17 0.729  Inline graphic0.11 3.408 Inline graphic1.64 4.772  Inline graphic0.49 0.286 Inline graphic0.07 0.897  Inline graphic0.01

The best results are highlighted in bold. Upward arrows indicate that higher values are better, and downward arrows signify that smaller values are better.

To rigorously validate the performance gains of FDC-Diff, we performed a statistical hypothesis test (In our study, the Wilcoxon signed-rank test [50, 51] was utilized) across the 20 test pockets against three baseline models, and the results are presented in Table 2. The results in Table 2 clearly show that the performance improvement of FDC-Diff over three baseline models is indeed significant across all three metrics. Therefore, these observations confirm that FDC-Diff not only surpasses the current state-of-the-art models in key pharmacological properties but also maintains robust performance across various evaluation metrics.

Table 2.

Results of statistical test on SA, QED, and Vina scores

Comparison SA QED Vina Score
Inline graphic -value Significance Inline graphic -value Significance Inline graphic -value Significance
FDC-Diff vs PMDM 9.536e-07 0.0133 1.335e-05
FDC-Diff vs FLAG 0.0001 2.861e-06 0.0036
FDC-Diff vs Pocket2Mol 0.0332 0.0003 0.0045

It is worth noting that TAGMol [52] sets property thresholds based on the property distribution observed in currently marketed drugs. According to its model description and the results of Ziv et al. [53], molecules with specific properties below a certain threshold are generally not considered “high-value” candidate molecules. In our study, we observed that molecules generated by other methods often exhibit highly unbalanced scores (e.g. as shown in Fig. 7), some molecules exhibit extremely high SA scores, but low QED scores, and vice versa), and many generated molecules are meaningless or unreasonable. In contrast, due to incorporating prior knowledge in the process of molecule generation, the proposed method FDC-Diff is more effective in maintaining structural rationality.

Figure 7.

Alt text: Plot showing counts of molecules that meet combined SA and QED thresholds across different cutoff settings.

Number of molecules simultaneously satisfying different SA and QED thresholds.

Quantitative comparison of generated and reference molecules

In this subsection, we selected two target proteins (i.e. 4z2g and 1d7j) as examples to quantitatively evaluate the generated molecules, and the results are shown in Fig. 8 and Fig. 9, respectively. For each target protein, we selected four generated molecules by our model, and we compare the docking performance of the generated molecules with the reference molecules.

Figure 8.

Alt text: Comparison on protein 4z2g: reference vs generated ligands.

Docking performance comparison between the reference molecule and generated molecules on the protein Inline graphic.

Figure 9.

Alt text: Comparison on protein 1d7j: reference vs generated ligands.

Docking performance comparison between the reference molecule and generated molecules on the protein Inline graphic.

We find that the reference molecules are usually macrocyclic compounds with low conformational strain, which can bind well to the protein pockets, but they have very limited practical value due to poor SA and minimal drug-likeness. The generated molecules perform better than reference molecules according to the overall performance, which exhibit significant advantages in both SA and QED. This observation further validates that our model can generate novel molecules to bind with target proteins, rather than merely modifying the reference molecules.

When evaluating the overall quality of generated molecules, conventional metrics provide a certain degree of reference, but a deeper understanding of model performance requires attention to local structural features. To this end, we conducted the following experiments and analyses.

Atomic composition analysis

Previous studies [54, 55] suggest that the Inline graphic of a candidate drug (which is defined as the ratio of non-carbon–hydrogen atoms to all non-hydrogen atoms) should generally fall within the range [0.05–0.50]. As shown in Fig. 10a, our method achieved superior performance according to this metric, for which Inline graphic99.3% generated molecules fall within the desirable range.

Figure 10.

Alt text: Distribution plots comparing our model and baselines for non-CH/non-H ratio, heavy-atom count, and total atom count.

Distributions of (non-CH/non-H) ratios, heavy atom count, and total atom count for molecules generated by our model and baselines.

In addition, the Retro Drug Design framework [56] emphasizes that most approved CNS small-molecule drugs typically contain Inline graphic27–37 heavy atoms. Moreover, several studies [48, 57] have proposed that the total number of atoms (including hydrogen) in drug-like compounds should generally fall within the range of 20–70. Our model also adheres well to these structural criteria, which generates a substantial number of molecules that fall within these optimal ranges (shown in Fig. 10b and c).

Ring structure analysis

As shown in Fig. 11, we also analyzed the distribution of the number of rings in generated molecules across all methods, as well as in the training and testing sets. The distribution produced by our model aligns closely with that of the training and test sets, indicating that it effectively captures the true data distribution from a local structural perspective.

Figure 11.

Alt text: Distribution plot of ring counts for molecules generated by our model and baselines.

Distributions of the number of rings in generated molecules by our model and baselines.

Statistical divergence of bond length and torsional angle distributions

In addition, we analyzed the distributions of bond lengths, bond angles, and dihedral angles in the generated molecules, comparing with those in the training set (as shown in Fig. 12). Note that we use the RDKit to calculate bond angles and dihedral angles in radians. Furthermore, we measured the KL divergence between the distributions of common bond angles and dihedral angles in molecules generated by different methods and those in the training set (as shown in Figs 13 and 14). A lower KL divergence indicates better model performance.

Figure 12.

Alt text: Distribution plots comparing generated molecules and training samples for C–O bond lengths, C–O–C bond angles, and two dihedral-angle types.

Distributions of C–O bond lengths, C–O–C bond angles, C–C–C–C dihedral angles, and C–C–C–O dihedral angles in generated molecules and training samples.

Figure 13.

Alt text: KL-divergence comparison of bond-angle distributions between our model and baseline methods.

Comparison between our model and baselines on KL divergence of common bond angles.

Figure 14.

Alt text: KL-divergence comparison of dihedral-angle distributions between our model and baseline methods.

Comparison between our model and baselines on KL divergence of common dihedral angles.

The results demonstrate that, in terms of bond angle distributions (shown in Fig. 13), our method shows lower divergence than baseline methods for most angle types (except for Inline graphic). Regarding dihedral angle distributions, moreover, our method outperforms the baseline methods for most types (shown in Fig. 14). These observations suggest that our model is able to effectively preserve geometric features, generating structurally reasonable molecules accordingly.

In summary, our method not only achieves balanced performance across multiple drug-likeness metrics but also demonstrates clear advantages in terms of structural rationality, showcasing stronger generalizability and high likelihood of producing potential drug candidates.

Ablation study

In this section, we conduct an ablation study to investigate the contributions of three main components in our model (i.e. Rule-of-Three (Ro3) fragment constraint, the adaptive assembly mechanism, and the decoupled training strategy) to the performance improvement on the task of fragment-based molecular generation. More specifically, three variants of FDC-Diff are built for performance comparison. “FDC-Diff_uf” denotes the variant without the filtering process, i.e. the Rule-of-Three (Ro3) fragment constraint of FDC-Diff is removed and replaced by random initial fragment selection; “FDC-Diff_ob” denotes the variant in which the YuelBond mechanism is replaced by the OpenBabel assembly; “FDC-Diff_single” denotes the variant in which the decoupled training strategy is replaced by a single-stage generation process. The comparison results of evaluation metrics are presented in Table 3, and the distributions of key physicochemical properties are visualized in Fig. 15.

Table 3.

Results of ablation study

Methods Vina Score High Affi. QED SA LogP Lip. Div.
(Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic) (Inline graphic)
FDC-Diff_ob −7.782 Inline graphic 1.80 0.269 Inline graphic0.28 0.680  Inline graphic0.14 0.331 Inline graphic0.10 0.708 Inline graphic1.06 4.908  Inline graphic0.35 0.875 Inline graphic0.01
FDC-Diif_uf −7.593 Inline graphic1.89 0.251 Inline graphic0.28 0.459 Inline graphic0.19 0.659 Inline graphic0.11 2.526 Inline graphic2.21 4.276 Inline graphic0.87 0.887 Inline graphic0.01
FDC-Diff_single −7.980 Inline graphic2.02 0.369 Inline graphic0.30 0.455 Inline graphic0.20 0.6134 Inline graphic0.15 2.573 Inline graphic2.16 4.373 Inline graphic0.90 0.894 Inline graphic0.02
FDC-Diff −8.730  Inline graphic1.92 0.563  Inline graphic0.34 0.643 Inline graphic0.17 0.729  Inline graphic0.11 3.408 Inline graphic1.64 4.772 Inline graphic0.49 0.897  Inline graphic0.01

Upward arrows indicate that higher values are better, and downward arrows signify that smaller values are better.

Figure 15.

Alt text: Violin plots summarizing property distributions for FDC-Diff and three ablation variants, enabling visual comparison of difference.

Chemical property distributions of molecules generated by FDC-Diff and three variants in the ablation study.

From Table 3 and Fig. 15, we can find that our model FDC-Diff performs better than all three variants by considering all evaluation metrics and all physicochemical properties, indicating the necessity of integrating these components for generating high-quality drug candidates.

First, “FDC-Diff_uf” generated molecules with a high average of rotatable bonds (with average 6.95 and median 7.0) and the fewest aromatic rings among the methods. Since its training and testing sets include fragments that do not comply with the Ro3 rule, this method may generate molecules with MW > 500 and TPSA >140 compared with our approach. These results indicate the importance of the Ro3 constraint for maintaining structural validity and reserving sufficient chemical space.

Second, “FDC-Diff_ob” exhibits a severe deterioration in structural validity and SA (with the average SA score 0.33) despite achieving a relatively higher QED. The generated molecules display chemically implausible features: a disproportionate bias toward aliphatic rings over aromatic systems, a complete absence of hydrogen bond donors (NH/OH count of 0), and an atypically low LogP (0.7083). This structural bias stems from a limitation in OpenBabel’s deterministic bond perception algorithms. In contrast, YuelBond employs an edge-focused iterative refinement. By aggregating contextual node and edge features rather than relying solely on raw geometry, YuelBond accurately infers bonds despite structural distortions, ensuring the generation of chemically valid structures.

Third, “FDC-Diff_single” generated molecules with higher molecular weights and TPSA, along with excessive rotatable bonds (average 6.83). Notably, this method yielded the highest number of aliphatic rings, resulting in structures with high conformational flexibility. The possible reason is that for molecular generation requiring precise topology control, the decoupled strategy is essential to balance global scaffold generation with local functional refinement, thereby avoiding the entropic penalty associated with single-stage methods.

In summary, the three main components in FDC-Diff are all essential for the effectiveness of the proposed framework, ensuring a superior balance between structural validity, SA, and pharmacological properties.

Lead compound generation

In this section, we apply the proposed model to the task of lead compound generation to evaluate its effectiveness for real-world drug development.

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that leads to cognitive decline, memory loss, and impaired daily functioning, affecting millions of people worldwide. Currently, acetylcholinesterase inhibitors (AChEIs) are the primary therapeutic option for managing cognitive deficits in mild to moderate AD. These inhibitors work by blocking acetylcholinesterase (AChE), which increases acetylcholine levels in the brain and enhances cholinergic neurotransmission, thereby improving cognitive function. Although their efficacy is established, current AChEIs face significant limitations. First, the therapeutic benefits are modest and fail to halt disease progression, offering only symptomatic relief that diminishes over time. Furthermore, these drugs are associated with various adverse effects, including gastrointestinal disturbances such as nausea and diarrhea, cardiovascular risks such as bradycardia and syncope, and neuropsychiatric symptoms such as anxiety, aggression, and hallucinations. These side effects often necessitate dose reductions and restrict tolerability.

To address these challenges and further explore the practical significance of FDC-Diff, we applied the model to generate high-affinity molecules targeting the AChE enzyme. We utilized a subunit from the crystal structure of the AChE chiral inhibitor reported by Catto [58] and colleagues (PDB ID: 6TT0) and analyzed it with the corresponding inhibitor MC1420 (labeled as N9T in the crystal) as shown in Fig. 16. Although MC1420 serves as an efficient reversible dual-binding site AChE inhibitor with nanomolar affinity (IC50 = 19.2 nM) and good selectivity for butyrylcholinesterase at 730-fold, its X-ray crystal structure resolution of only 2.8 Åprevents clear determination of the absolute configuration and requires additional docking simulations for support. Furthermore, prolonged exposure in cytotoxicity tests reveals potential toxicity, which may limit its further application in drug development.

Figure 16.

Alt text: Composite panel showing N9T bound to AChE (PDB 6TT0), its key properties, and the fragmentation pipeline from reaction-rule cuts to BRICS fragments that yields a common seed fragment.

The ligand N9T in the AChE crystal structure of PDB 6TT0 and its physicochemical properties. Fragmentation strategy for N9T: First, reaction rules were applied to cleave the molecule at Sites 1 and 2, respectively, yielding two distinct sets of scaffold-R-group pairs. Subsequently, the BRICS rules were employed to further decompose these scaffolds, isolating an identical seed fragment as the start of generation process.

In this study, we applied FDC-Diff to fragment the inhibitor and obtained seed fragments along with scaffold-R-group pairs. These seed fragments were assembled with denoised generated components to produce 10 000 optimized molecules. As illustrated in Fig. 17, we plotted the distribution of three key properties (i.e. QED, SA, and Vina scores) for the generated molecules. The results show that all molecules have good Vina scores and SA scores, while the QED scores are moderate. Overall, the proposed method FDC-Diff holds significant promise in drug discovery by effectively producing molecules with excellent drug-likeness, SA, and binding affinity, thereby offering new candidate compounds for treating Alzheimer’s disease.

Figure 17.

Alt text: Scatter plot of QED, SA, and docking score for 10 000 generated molecules.

Illustration of three key properties of generated molecules.

In addition, to further investigate the quality of the generated molecules, we selected two compounds with desirable properties (as shown in Fig. 18). From Fig. 18, we found that it forms hydrogen bonds with the ASP-72 and TYR-121 amino acid residues in the target protein pocket. This type of molecular interaction typically enhances binding stability. In contrast, the original ligand (shown in Fig. 16) does not form direct hydrogen bonds with the protein. Hydrogen bonds, as a crucial type of molecular interaction, can improve the binding stability between the ligand and receptor, potentially leading to more durable binding in drug design. Therefore, these characteristics demonstrate the effectiveness of the proposed method for deriving feasible or potential candidates in real-world drug development.

Figure 18.

Alt text: Two example molecules generated by our method.

Illustration of two generated molecules with desirable properties.

Conclusions and future work

In this paper, we have proposed a fragment-based dual conditional diffusion generation framework, FDC-Diff. The model starts with seed fragments to first progressively grow a reasonable molecular scaffold, and then completes the substituents step by step based on this scaffold, ultimately generating a realistic 3D molecule. Based on the comprehensive experimental results, FDC-Diff demonstrates superior overall performance on standard molecular generation benchmarks, and significantly outperforms existing baseline methods in generating structures with potential drug-like properties.

Although achieving the best performance, the proposed model still has two limitations. The first limitation lies in the restricted expressive power of its underlying GNN architecture, which may cap its ability to capture intricate molecular interactions. Second, our model relies on atomic diffusion coupled with a heuristic bond inference by applying the YuelBond. By inferring bond types based on static distances rather than explicitly modeling bond formation, the proposed model becomes sensitive to minor coordinate deviations, occasionally yielding chemically invalid structures.

In future work, first, we plan to develop a unified joint diffusion framework that simultaneously evolves atomic coordinates and bond topologies, thereby better capturing the true distributional properties of molecular graphs. Second, we will integrate rigorous physicochemical priors into protein–ligand modeling to ensure generated compounds possess structural and functional realism. Third, moving beyond traditional metrics like binding affinity or SA, we plan to adopt multi-modal evaluation strategies akin to the DFT-ANPD framework [59], which will allow us to assess both structural and semantic validity, steering molecule generation toward clinical drug standards. Finally, to mitigate the “black box” nature of current models, we will explore the integration of Kolmogorov–Arnold Networks to enhance the mechanistic interpretability, offering clearer insights into the decision-making process and facilitating trust in drug design applications.

Key Points

  • A fragment-based dual conditional diffusion framework is proposed for 3D molecular structure generation. As far as we know, this is the first attempt to introduce diffusion-based generative model to the task of fragment-based drug design.

  • The coordinated dual-diffusion architecture is able to effectively model distinctive semantics contained in scaffold and R group, respectively.

  • Extensive experiments verify the effectiveness of the proposed method for the molecular generation task.

Acknowledgements

Authors are grateful to the anonymous reviewers for helpful comments.

Contributor Information

Haotian Chen, Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China.

Yiting Shen, Detroit Green Technology Institute, Hubei University of Technology, Wuhan, Hubei 430079, PR China.

Jichun Li, School of Computing, Newcastle University, Newcastle upon Tyne NE4 5TG, United Kingdom.

Weizhong Zhao, Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China; School of Computer, Central China Normal University, Wuhan, Hubei 430079, PR China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China.

Author contributions

Haotian Chen (Investigation, Writing—original draft), Yiting Shen (Writing—review & editing), Jichun Li (Writing—review & editing), Weizhong Zhao (Conceptualization, Methodology, Investigation)

Conflicts of interest: None declared.

Funding

This work was partially supported by the National Natural Science Foundation of China (grant nos 62472192 and 62372205), the Self-determined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE (grant no. CCNU25JC008), the National Language Commission Key Research Project (grant no. ZDI145-56), the Fundamental Research Funds for Central Universities (grant no. KJ02502022-0450).

Data availability

The data and source codes are available in GitHub at https://github.com/CHT713/FDC-Diff.

References

  • 1. Sharma  VK, Bharatam  PV. Artificial intelligence in drug discovery (AIDD). Curr Res Inf Pharm Sci  2022; 16:3–7. [Google Scholar]
  • 2. Abbasi  K, Razzaghi  P, Gharizadeh  A. et al.  Computational drug design in the AI era: a systematic review of molecular representations, generative architectures, and performance assessment. Pharmacol Rev  2025;78:100095. 10.1016/j.pharmr.2025.100095 [DOI] [PubMed] [Google Scholar]
  • 3. Wang  M, Hsieh  C-Y, Wang  J. et al.  Relation: a deep generative model for structure-based de novo drug design. J Med Chem  2022;65:9478–92. 10.1021/acs.jmedchem.2c00732 [DOI] [PubMed] [Google Scholar]
  • 4. Hartenfeller  M, Schneider  G. De novo drug design. Chemoinformatics and computational chemical biology  2010;672:299–323. 10.1007/978-1-60761-839-3_12 [DOI] [PubMed] [Google Scholar]
  • 5. Schneider  G, Fechner  U. Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov  2005;4:649–63. 10.1038/nrd1799 [DOI] [PubMed] [Google Scholar]
  • 6. Kumar  R, Sharma  A, Alexiou  A. et al.  Artificial intelligence in de novo drug design: Are we still there?  Curr Top Med Chem  2022;22:2483–92. 10.2174/1568026623666221017143244 [DOI] [PubMed] [Google Scholar]
  • 7. Meyers  J, Fabian  B, Brown  N. De novo molecular design and generative models. Drug Discov Today  2021;26:2707–15. 10.1016/j.drudis.2021.05.019 [DOI] [PubMed] [Google Scholar]
  • 8. Li  Q. Application of fragment-based drug discovery to versatile targets. Front Mol Biosci  2020;7:180. 10.3389/fmolb.2020.00180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Bon  M, Bilsland  A, Bower  J. et al.  Fragment-based drug discovery—the importance of high-quality molecule libraries. Mol Oncol  2022;16:3761–77. 10.1002/1878-0261.13277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Mureddu  LG, Vuister  GW. Fragment-based drug discovery by NMR. Where are the successes and where can it be improved?  Front Mol Biosci  2022;9:834453. 10.3389/fmolb.2022.834453 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Xu  W, Kang  C. Fragment-based drug design: from then until now, and toward the future. Journal of Medicinal Chemistry  2025;68:5000–4. [DOI] [PubMed] [Google Scholar]
  • 12. Hajduk  PJ, Greer  J. A decade of fragment-based drug design: strategic advances and lessons learned. Nat Rev Drug Discov  2007;6:211–9. [DOI] [PubMed] [Google Scholar]
  • 13. Kirsch  P, Hartman  AM, Hirsch  AKH. et al.  Concepts and core principles of fragment-based drug design. Molecules  2019;24:4309. 10.3390/molecules24234309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Grenier  D, Audebert  S, Preto  J. et al.  Linkers in fragment-based drug design: an overview of the literature. Expert Opin Drug Discovery  2023;18:987–1009. 10.1080/17460441.2023.2234285 [DOI] [PubMed] [Google Scholar]
  • 15. Loving  K, Alberts  I, Sherman  W. Computational approaches for fragment-based and de novo design. Curr Top Med Chem  2010;10:14–32. 10.2174/156802610790232305 [DOI] [PubMed] [Google Scholar]
  • 16. Green  H, Koes  DR, Durrant  JD. DeepFrag: a deep convolutional neural network for fragment-based lead optimization. Chem Sci  2021;12:8036–47. 10.1039/d1sc00163a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Igashov  I, Stärk  H, Vignac  C. et al.  Equivariant 3D-conditional diffusion model for molecular linker design. Nat Mach Intell  2024;6:417–27. 10.1038/s42256-024-00815-9 [DOI] [Google Scholar]
  • 18. Zhang  Z, Min  Y, Zheng  S. et al.  Molecule generation for target protein binding with structural motifs. In: The Eleventh International Conference on Learning Representations, 2023.
  • 19. Zhang  O, Huang  Y, Cheng  S. et al.  FragGen: towards 3D geometry reliable fragment-based molecular generation. Chem Sci  2024;15:19452–65. 10.1039/d4sc04620j [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Xie  J, Chen  S, Lei  J. et al.  DiffDec: structure-aware scaffold decoration with an end-to-end diffusion model. J Chem Inf Model  2024;64:2554–64. 10.1021/acs.jcim.3c01466 [DOI] [PubMed] [Google Scholar]
  • 21. Ma  C, Lindsley  CW, Chang  J. et al.  Rational molecular editing: a new paradigm in drug discovery. J Med Chem  2024;67:11459–66. [DOI] [PubMed] [Google Scholar]
  • 22. Ke  W, Karapetyan  E, Schloss  J. et al.  Advancements in small molecule drug design: a structural perspective. Drug Discov Today  2023;28:103730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Xue  L, Bajorath  J. Distribution of molecular scaffolds and r-groups isolated from large compound databases. Mol Model Ann  1999;5:97–102. 10.1007/s008940050125 [DOI] [Google Scholar]
  • 24. Schaub  J, Zander  J, Zielesny  A. et al.  Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK). J Chem  2022;14:79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Manelfi  C, Gemei  M, Talarico  C. et al.  “Molecular Anatomy”: a new multi-dimensional hierarchical scaffold analysis tool. J Chem  2021;13:1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Li  Y, Jianxing  H, Wang  Y. et al.  DeepScaffold: a comprehensive tool for scaffold-based de novo drug discovery using deep learning. J Chem Inf Model  2019;60:77–91. [DOI] [PubMed] [Google Scholar]
  • 27. Lim  J, Hwang  S-Y, Moon  S. et al.  Scaffold-based molecular design with a graph generative model. Chem Sci  2020;11:1153–64. 10.1039/C9SC04503A [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Andriole  VT. The quinolones: past, present, and future. Clin Infect Dis  2005;41:S113–9. [DOI] [PubMed] [Google Scholar]
  • 29. Millanao  AR, Mora  AY, Villagra  NA. et al.  Biological effects of quinolones: a family of broad-spectrum antimicrobial agents. Molecules  2021;26:7153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Rusu  A, Munteanu  A-C,  ArbănaInline graphici E-M. et al. Overview of side-effects of antibacterial fluoroquinolones: new drugs versus old drugs, a step forward in the safety profile?  Pharmaceutics  2023;15:804. 10.3390/pharmaceutics15030804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Lei  X, Li  Y, Lai  Y. et al.  Strain-driven dyotropic rearrangement: a unified ring-expansion approach to Inline graphic-methylene-Inline graphic-butyrolactones. Angew Chem Int Ed  2021;60:4221–30. [DOI] [PubMed] [Google Scholar]
  • 32. Peng  X, Luo  S, Guan  J. et al.  Pocket2Mol: efficient molecular sampling based on 3D protein pockets. In: International Conference on Machine Learning, pp. 17644–55. PMLR, 2022. [Google Scholar]
  • 33. Huang  L, Tingyang  X, Yang  Y. et al.  A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat Commun  2024;15:2657. 10.1038/s41467-024-46569-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Mingyuan  X, Ran  T, Chen  H. De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites. J Chem Inf Model  2021;61:3240–54. 10.1021/acs.jcim.0c01494 [DOI] [PubMed] [Google Scholar]
  • 35. Hoogeboom  E, Satorras  VG, Vignac  C. et al.  Equivariant diffusion for molecule generation in 3D. In: International Conference on Machine Learning, pp. 8867–87. PMLR, 2022. [Google Scholar]
  • 36. Satorras  V, G, Hoogeboom  E, Welling  M. E (n) equivariant graph neural networks. In: International Conference on Machine Learning, pp. 9323–32. PMLR, 2021. [Google Scholar]
  • 37. Huang  L, Zhang  H, Tingyang  X. et al.  MDM: molecular diffusion model for 3D molecule generation. In: Proceedings of the AAAI Conference on Artificial Intelligence  2023;37:5105–12. 10.1609/aaai.v37i4.25639 [DOI] [Google Scholar]
  • 38. Jin  W, Barzilay  R, Jaakkola  T. Junction tree variational autoencoder for molecular graph generation. In: International Conference on Machine Learning, pp. 2323–32. PMLR, 2018. [Google Scholar]
  • 39. Ho  J, Jain  A, Abbeel  P. Denoising diffusion probabilistic models. Advances in neural information processing systems  2020;33:6840–51. [Google Scholar]
  • 40. Wang  J, Dokholyan  NV. Multimodal bonds reconstruction towards generative molecular design. J Chem Inf Model  2025. [DOI] [PMC free article] [PubMed]
  • 41. O’Boyle  NM, Banck  M, James  CA. et al.  Open babel: an open chemical toolbox. J Chem  2011;3:1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Francoeur  PG, Masuda  T, Sunseri  J. et al.  Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J Chem Inf Model  2020;60:4200–15. 10.1021/acs.jcim.0c00411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Luo  S, Guan  J, Ma  J. et al.  A 3D generative model for structure-based drug design. Advances in Neural Information Processing Systems  2021;34:6229–39. [Google Scholar]
  • 44. Steinegger  M, Söding  J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol  2017;35:1026–8. 10.1038/nbt.3988 [DOI] [PubMed] [Google Scholar]
  • 45. Degen  J, Wegscheid-Gerlach  C, Zaliani  A. et al.  On the art of compiling and using ’drug-like’chemical fragment spaces. ChemMedChem  2008;3:1503–7. 10.1002/cmdc.200800178 [DOI] [PubMed] [Google Scholar]
  • 46. Fialková  V, Zhao  J, Papadopoulos  K. et al.  LibINVENT: reaction-based generative scaffold decoration for in silico library design. J Chem Inf Model  2021;62:2046–63. 10.1021/acs.jcim.1c00469 [DOI] [PubMed] [Google Scholar]
  • 47. Polykovskiy  D, Zhebrak  A, Sanchez-Lengeling  B. et al.  Molecular sets (moses): a benchmarking platform for molecular generation models. Front Pharmacol  2020;11:565644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Ghose  AK, Viswanadhan  VN, Wendoloski  JJ. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb Chem  1999;1:55–68. 10.1021/cc9800071 [DOI] [PubMed] [Google Scholar]
  • 49. Lipinski  CA, Lombardo  F, Dominy  BW. et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev  1997;23:3–25. 10.1016/S0169-409X(96)00423-1 [DOI] [PubMed] [Google Scholar]
  • 50. Wilcoxon  F. Individual comparisons by ranking methods. Biometrics  1945;1:80–3. 10.2307/3001968 [DOI] [PubMed] [Google Scholar]
  • 51. Rosner  B, Glynn  RJ, Lee  M-LT. The wilcoxon signed rank test for paired comparisons of clustered data. Biometrics  2006;62:185–92. 10.1111/j.1541-0420.2005.00389.x [DOI] [PubMed] [Google Scholar]
  • 52. Dorna  V, Subhalingam  D, Kolluru  K. et al.  TAGMol: target-aware gradient-guided molecule generation. In ICML'24 Workshop ML for Life and Material Science: From Theory to Industry Applications. 2024.
  • 53. Ziv  Y, Imrie  F, Marsden  B. et al.  MolSnapper: conditioning diffusion for structure-based drug design. J Chem Inf Model  2025;65:4263–73. 10.1021/acs.jcim.4c02008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Properties  ED-l. Drug-Like Properties: Concepts Structure Design and Methods: From Adme to Toxicit Optimization  2008.
  • 55. Mao  F, Ni  W, Xiang  X. et al.  Chemical structure-related drug-like criteria of global approved drugs. Molecules  2016;21:75. 10.3390/molecules21010075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Wang  Y, Michael  S, Yang  S-M. et al.  Retro drug design: from target properties to molecular structures. J Chem Inf Model  2022;62:2659–69. 10.1021/acs.jcim.2c00123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Kralj  S, Jukič  M, Bren  U. Molecular filters in medicinal chemistry. Encyclopedia  2023;3:501–11. 10.3390/encyclopedia3020035 [DOI] [Google Scholar]
  • 58. Catto  M, Pisani  L, de La Mora  E. et al.  Chiral separation, X-ray structure, and biological evaluation of a potent and reversible dual binding site AChE inhibitor. ACS Med Chem Lett  2020;11:869–76. 10.1021/acsmedchemlett.9b00656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Norouzi  R, Norouzi  R, Abbasi  K. et al.  DFT_ANPD: a dual-feature two-sided attention network for anticancer natural products detection. Comput Biol Med  2025;194:110442. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data and source codes are available in GitHub at https://github.com/CHT713/FDC-Diff.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES