Abstract
Accurate prediction of molecular properties is crucial for selecting compounds with ideal properties and reducing the costs and risks of trials. Traditional methods based on manually crafted features and graph-based methods have shown promising results in molecular property prediction. However, traditional methods rely on expert knowledge and often fail to capture the complex structures and interactions within molecules. Similarly, graph-based methods typically overlook the chemical structure and function hidden in molecular motifs and struggle to effectively integrate global and local molecular information. To address these limitations, we propose a novel fingerprint-enhanced hierarchical graph neural network (FH-GNN) for molecular property prediction that simultaneously learns information from hierarchical molecular graphs and fingerprints. The FH-GNN captures diverse hierarchical chemical information by applying directed message-passing neural networks (D-MPNN) on a hierarchical molecular graph that integrates atomic-level, motif-level, and graph-level information along with their relationships. Additionally, we used an adaptive attention mechanism to balance the importance of hierarchical graphs and fingerprint features, creating a comprehensive molecular embedding that integrated hierarchical molecular structures with domain knowledge. Experiments on eight benchmark datasets from MoleculeNet showed that FH-GNN outperformed the baseline models in both classification and regression tasks for molecular property prediction, validating its capability to comprehensively capture molecular information. By integrating molecular structure and chemical knowledge, FH-GNN provides a powerful tool for the accurate prediction of molecular properties and aids in the discovery of potential drug candidates.
Keywords: Deep learning, Hierarchical molecular graph, Molecular fingerprint, Molecular property prediction, Directed message-passing neural network
Graphical abstract
Highlights
-
•
We propose FH-GNN for molecular property prediction by integrating hierarchical structures and chemical knowledge.
-
•
Hierarchical molecular graphs combine atomic-level, motif-level, and graph-level information.
-
•
FH-GNN outperforms baseline models on eight datasets in classification and regression tasks for molecular property prediction.
1. Introduction
Drug discovery is a multifaceted and complex process that often encounters failure owing to off-target effects and adverse reactions [[1], [2], [3], [4], [5], [6]]. Accurate prediction of molecular biological and chemical properties is essential for drug discovery and design processes [7]. By predicting properties such as biological activity, toxicity, solubility, and stability, researchers can identify potential failures in the early stages, thereby reducing costs and risks. However, traditional experimental procedures are time-consuming and expensive, which greatly limits their application [8,9]. In recent years, computer-aided methods for drug property prediction have attracted considerable attention because of their ability to assist researchers in evaluating molecular properties more rapidly and accurately, ultimately enhancing control over the safety and efficacy of drugs [3,[10], [11], [12]].
Traditional methods for drug property prediction rely on manually crafted features such as chemical descriptors [13] and molecular fingerprints [14]. Chemical descriptors are numerical values that describe molecular properties, including structural and physicochemical characteristics. Molecular fingerprints are binary strings or bit vectors that indicate the presence or absence of specific structures or features in drug molecules. Chemical descriptors and molecular fingerprints can be used as inputs for machine learning algorithms such as naive Bayes (NB) [15], random forest (RF) [16], extreme gradient boosting (XGBoost) [17], support vector machine (SVM), and deep neural networks (DNN) [18], to predict drug properties. However, manually crafted molecular features fail to capture all relevant molecular information, particularly complex structural features and interactions. These methods rely on existing chemical knowledge and may not be generalizable to novel chemical structures, thereby limiting their applicability to diverse datasets or new problems.
In recent years, deep learning methods, particularly models using graph neural networks (GNNs), have shown significant potential for molecular property prediction by automatically extracting valuable information from data [[19], [20], [21], [22], [23]]. These models naturally represent molecules as graphs with atoms as nodes and chemical bonds as edges. Unlike traditional methods that rely on manually crafted features, graph-based methods can better capture the topological structures and interaction information of molecules. GNNs iteratively update node representations by aggregating information from neighboring nodes, thereby enabling a comprehensive understanding of their molecular structures. Well-known GNN architectures, such as graph convolutional network (GCN) [[24], [25], [26]], graph attention network (GAT) [[27], [28], [29]], and message-passing neural network (MPNN) [30,31], have been widely applied in molecular property prediction. Later studies further integrated edge features into the message-passing process to enhance the expressive power of molecular representations, such as directed MPNN (D-MPNN) [32] and contextual MPNN (C-MPNN) [33]. However, these models primarily extract atomic-level features and ignore the information on molecular motifs that contain functional groups.
The integration of molecular motif information to augment molecular property prediction has garnered considerable interest as a means of extracting richer chemical insights from molecules. Molecular motifs are defined as frequently occurring substructural patterns. Zhang et al. [34] captured molecular motif information by introducing a graph self-supervised motif generation framework. Wang et al. [35] considered the chemical reaction relationships between molecules and calculated the molecular representation of products by summing all the reactants. Zhang et al. [36] modeled a clustering problem to learn molecular motifs and trained GNNs using graph-to-subgraph contrastive learning. Zang et al. [37] introduced a pre-trained hierarchical molecular GNN (HimGNN) framework to encode molecules, capture motif structures, and extract multilevel molecular representations. Han et al. [38] designed a transformer-based local augmentation module to learn hierarchical molecular representations by integrating the atom and motif information. Combining atomic- and motif-based graph information provides a comprehensive understanding of both global and local molecular information. However, most models lack a global structure to effectively integrate atom and motif information and neglect the interaction between these two levels. Although GNNs have shown strong capabilities in extracting hidden chemical information from molecular structures, whether they outperform traditional models based on manually crafted features for molecular property prediction tasks remains controversial. Some studies claimed that GNNs were superior [39,40], while others presented the opposite conclusion [41,42]. Deng et al. [3] indicated that no single representation method consistently outperformed all tasks and that deep learning methods were not always stronger than traditional methods.
To combine the advantages of hierarchical molecular graphs and molecular fingerprints, we propose a novel fingerprint-enhanced hierarchical GNN (FH-GNN) to predict molecular properties. The FH-GNN consists of three main modules: a hierarchical molecular graph representation module, a molecular fingerprint encoding module, and a fusion and prediction module. First, we constructed hierarchical molecular graphs that integrated atomic-, motif-, and graph-level information, along with the relationships between these levels. The D-MPNN within the hierarchical molecular graph representation module was used to learn the node and edge embeddings from the hierarchical molecular graphs, effectively capturing both the global and local features of the molecular structures. Simultaneously, we employed a molecular fingerprint encoding module to extract the fingerprint features of the molecules, which complemented the hierarchical graph representation and further enhanced the prediction accuracy by providing strong priors. Finally, we used an adaptive attention mechanism [43] in the fusion and prediction module to integrate the hierarchical molecular graph features with the fingerprint features, thereby enabling the model to learn a more comprehensive representation. The adaptive attention mechanism adjusts the importance of different features dynamically, thereby improving the integration of multiple information sources. By leveraging a comprehensive understanding of molecular structures and chemicals, FH-GNN enhances the efficiency and precision of molecular property predictions, thereby serving as a powerful tool for the discovery of potential drug candidates.
2. Experimental
2.1. Datasets
To compare FH-GNN with existing methods, we conducted molecular property prediction experiments on eight benchmark datasets from MoleculeNet [40], which include five classification tasks and three regression tasks. These tasks cover a wide range of molecular properties, such as physical chemistry, biophysics, and physiology. Table 1 provides statistical information on these datasets. BACE is a single-task classification dataset of human β-secretase 1 inhibitors, comprising 1,513 compounds. BBBP is a dataset about the ability of molecules to penetrate the blood–brain barrier (BBB), containing 2,039 molecules with binary labels. Accurate prediction of barrier penetration ability is crucial for drug development targeting the central nervous system. Tox21 is a database comprising the multi-classification toxicity profiles of compounds, encompassing 7,831 molecules targeting 12 distinct drug toxicity-related receptors. SIDER is a database of marketed drugs and adverse drug reactions, with side effects grouped into 27 system organ classes based on Medical Dictionary for Regulatory Activities (MedDRA) classifications, covering 1,427 approved drugs. ClinTox compares U.S. Food and Drug Administration (FDA)-approved drugs with those that failed clinical trials due to toxicity, featuring two classification tasks for 1,477 compounds. ESOL is a dataset about water solubility of 1,128 compounds. FreeSolv provides experimental and calculated hydration free energies for 642 small molecules in water. Lipophilicity is sourced from the ChEMBL database, providing experimental results of the octanol/water distribution coefficient (logD at pH 7.4) for 4,200 compounds. We removed molecules that were not recognized by RDKit, as well as single-atom molecules, from the dataset.
Table 1.
Statistics of datasets.
| Dataset | The number of tasks | The number of molecules | The average number of atoms of molecules | The average number of motifs of molecules | Tasks type |
|---|---|---|---|---|---|
| BACE | 1 | 1,513 | 34 | 10 | Classification |
| BBBP | 1 | 2,039 | 24 | 6 | Classification |
| Tox21 | 12 | 7,831 | 19 | 5 | Classification |
| SIDER | 27 | 1,427 | 34 | 9 | Classification |
| ClinTox | 2 | 1,477 | 26 | 7 | Classification |
| ESOL | 1 | 1,128 | 13 | 4 | Regression |
| FreeSolv | 1 | 642 | 9 | 2 | Regression |
| Lipophilicity | 1 | 4,200 | 27 | 8 | Regression |
2.2. Overview of FH-GNN framework
Fig. 1 illustrates the framework of the proposed FH-GNN model, which consists of three main modules: a hierarchical molecular graph representation module, a molecular fingerprint encoding module, and a fusion and prediction module. For a given molecule, we first utilized the breaking of retrosynthetically interesting chemical substructure (BRICS) algorithm for fragmentation to construct a hierarchical molecular graph. Next, the hierarchical molecular graph is encoded by the D-MPNN within the hierarchical molecular graph representation module to capture both the global and local features of the molecular structures. Simultaneously, we employed a molecular fingerprint-encoding module to extract fingerprint features. The encoded features from the hierarchical molecular graph and molecular fingerprint are then fed into the fusion and prediction modules. This module employs an adaptive attention mechanism to balance the importance of the hierarchical graph and fingerprint features to create comprehensive molecular embedding. Finally, the generated molecular embedding was fed into a multilayer perceptron (MLP) network to predict target property values.
Fig. 1.
Overview of the fingerprint-enhanced hierarchical graph neural network (FH-GNN) framework. (A) Calculate the hidden state of each directed edge of the hierarchical molecular graph based on incoming messages from other edges that share the same starting node, and then updates the node with those hidden states. (B) Molecular fingerprints are encoded by a multilayer perceptron (MLP) with activation function. (C) The adaptive attention mechanism dynamically assigns weights to graph-based and fingerprint-based features, determining their relative importance during the model's learning process. and are learnable weights of the hierarchical molecular graph and the molecular fingerprints, respectively.
2.3. Hierarchical molecular graph representation module
Given a molecule, we first represented it as a molecular graph, where Vn is the set of atoms and En is the set of bonds. The initial atom and bond features are shown in Table 2. We represent the initial atom features as vectors of size 89, and the initial bond features as vectors of size 9. All the features were generated using the open-source package RDKit. We then decomposed the molecule into several motifs using the BRICS algorithm, as illustrated in Fig. 2. The BRICS algorithm defines 16 rules and breaks bonds corresponding to a series of chemical reactions. These bonds can be easily formed or broken using standard chemical synthesis processes. This process is similar to retrosynthesis, a method used by chemists to synthesize complex molecules by breaking them down into simpler precursor structures. BRICS retain molecular components with important structural and functional elements, such as aromatic rings. Next, we construct a motif graph based on these motifs. All the obtained motifs were added as nodes Vm to the motif graph, and the edges Em represent the relative spatial relationships between the motifs. If an atom in one motif was adjacent to an atom in another, an edge was added between the two motifs. The motif graph was integrated into the molecular graph by adding the atom-motif edges Eam between each motif node and all the atom nodes covered. In addition, we constructed a graph-level super node Vg and connected it to all motif nodes to form a motif-graph edge Emg. The features of the motif nodes and graph-level super node were the sum of the features of all the atoms they contained. The final hierarchical molecular graph is represented as:
| (1) |
Table 2.
Atom and bond features used in fingerprint-enhanced hierarchical graph neural network (FH-GNN).
| Feature | Description | Length |
|---|---|---|
| Atom type | Type of atom | 60 |
| Degree | Number of heavy atom neighbors | 7 |
| Formal charge | The electronic charge assigned to the atom | 6 |
| Chirality | The chirality type of the atom | 3 |
| Number Hs | Number of bonded hydrogen atom | 5 |
| Hybridization | The hybridization form of the atom | 7 |
| Aromaticity | The atom is or is not part of an aromatic system | 1 |
| Bond type | Type of bond | 4 |
| Conjugation | The bond is or is not conjugated | 1 |
| Ring | The bond is or is not part of a ring | 1 |
| Stereo | The stereo type of the bond | 3 |
Fig. 2.
Overview of the construction of the hierarchical molecular graph.
We used D-MPNN, a variation of the traditional MPNN, to learn the node and edge embeddings from hierarchical molecular graph. The D-MPNN considers directed edges and learns hierarchical molecular graph encoding through edge-centered convolutions instead of node-centered convolutions, thus avoiding unnecessary loops during the message passing phase. Before message passing, the initial directed edge features are obtained via a simple concatenation of the atom features of the first atom of a bond to the respective undirected bond features . These initial directed edge features are then passed through a neural network layer with learnable weights and a non-linear activation function to construct the hidden directed edge features .
| (2) |
Where denotes simple concatenation and τ is the rectified linear unit (ReLU) activation function.
In the message passing phase, messages are passed between directed edges. The hidden state of each directed edge is iteratively updated based on the incoming messages from other edges that share the same starting node. This process respects the directionality of the edges, ensuring that the flow of information follows the correct path as dictated by the hierarchical molecular graph structure.
| (3) |
Where means the neighbors of node V excluding node W.
After message passing, the hidden states of all incoming directed edges to a node are summed to update the node features. The concatenation of the initial atomic features and the sum of the hidden states of all incoming directed edges is passed through a neural network layer with learnable weights and a non-linear activation function to produce the updated atomic feature . This aggregation ensures that the node features incorporate information from all the connected edges.
| (4) |
Finally, the hidden states of all nodes in the hierarchical molecular graph are aggregated to produce a comprehensive feature vector for the entire molecular graph .
| (5) |
2.4. Molecular fingerprint encoding module
Molecular fingerprints encode molecules into bit or binary strings based on various established rules to indicate the presence or absence of specific structural features or substructures within a molecule. Common types of molecular fingerprints include substructure-based fingerprints, topological or path-based fingerprints, circular fingerprints, and pharmacophore fingerprints. In our study, we employed five different fingerprint encoding techniques to represent molecular structural information: AtomPairs fingerprints, MACCS fingerprints, MorganBits fingerprints, MorganCounts fingerprints, and pharmacophore fingerprints. AtomPairs fingerprints are defined as substructure fragments consisting of two non-hydrogen atoms and an interatomic separation measured in bonds along the shortest path connecting these two atoms. MACCS fingerprints use a predefined set of keys to encode molecular substructures and are available in two main variants: one with 166 bits and the other with 960 bits. In this study, a shorter variant of 166 bits was selected. Morgan fingerprints capture the neighborhood information of each atom in a molecule by considering the features of adjacent atoms within a specified radius. Common variants include MorganBits and MorganCounts. MorganBits uses fixed-length bit vectors to represent molecules, with each bit corresponding to the presence or absence of a specific substructure. By contrast, MorganCounts uses a hash table or sparse vector to record the frequency of each substructure in the molecule. We set the radius of the Morgan fingerprint to 2 and the number of bits to 2048. Pharmacophore fingerprints encode pharmacophoric properties of molecules. Each fingerprinting technique generated a fixed-length binary vector representation for each molecule, where each bit corresponded to a unique molecular substructure. These fingerprint types represent specific sets of complementary molecular properties and collectively provide a comprehensive representation of molecular features.
We concatenated five complementary fingerprints and input them into the molecular fingerprint encoding module.
| (6) |
| (7) |
Where is AtomPairs fingerprints, is MACCS fingerprints, is MorganBits fingerprints, is MorganCounts fingerprints, is pharmacophore fingerprints, τ is the ReLU activation function, is weight matric, and is bias vector. The molecular fingerprint encoding module utilizes a MLP with ReLU activation functions to learn nonlinear mapping from the input space to the latent feature space. This module extracts meaningful fingerprint features from the input molecular fingerprints, which are indispensable for the accurate prediction of molecular properties.
2.5. Fusion and prediction module
To ensure compatibility and consistency among the different features while minimizing redundancy, a fusion and prediction module was designed to acquire joint representations and predict the target property values. For a given molecule, we obtained its hierarchical molecular graph features and molecular fingerprint features using the previously described steps. These encoded features are then input into a fusion and prediction module. This module employs an adaptive attention mechanism to integrate the two types of features, thereby creating more comprehensive molecular embedding. Hierarchical graph features capture the structural and relational properties of molecules at various levels (atomic, motif, and graph levels), whereas molecular fingerprints represent the presence of specific chemical substructures or functional groups. Because each feature set may have different levels of relevance for various tasks, an adaptive attention mechanism can dynamically adjust the importance of each feature, depending on the context or specific task. In other words, the use of an adaptive attention mechanism to fuse hierarchical graph features and fingerprint features enables the model to dynamically prioritize relevant information, reduce redundancy, and adapt to the complexity of molecular data. Finally, the generated molecular embedding is fed into a MLP with ReLU activation functions to predict the target property values .
| (8) |
Where and are learnable weights, and is a bias vector.
2.6. Evaluation metrics
For classification tasks, we used area under the receiver operating characteristic curve (ROC-AUC) (the larger the better) as the evaluation metric. ROC-AUC evaluates the trade-off between true positive rate and the false positive rate, measuring the model's ability to distinguish between classes. For regression tasks, we used the root mean square error (RMSE) (the smaller the better) to evaluate the predictive performance of the constructed models. RMSE provides an overall measure of the prediction error and reflects the model's performance in terms of how closely predictions match actual values.
| (9) |
2.7. Baselines
To verify the performance of our method, we compared FH-GNN with ten popular baseline methods. The reason for selecting these methods is that they share some similarities with our approach in terms of design. These methods include three that integrate molecular fingerprints (AttentiveFP, fingerprints and GNN (FP-GNN), and dual-GNNs contrastive learning (DGCL)), four that consider motif information (motif-based graph self-supervised learning (MGSSL), hierarchical molecular graph self-supervised learning (HiMol), HimGNN, and Mix-Key), and three that involve message-passing techniques (D-MPNN, C-MPNN, and graph representation from self-supervised message-passing transformer (GROVER)). By including these diverse yet relevant methods, we aim to provide a comprehensive evaluation of our model's performance compared to existing approaches. All of the baseline methods are summarized as follows:
-
i)
AttentiveFP [29] combined graph convolutional layers and attention mechanisms for molecular property prediction.
-
ii)
MGSSL [34] captured molecular motif information by introducing a graph self-supervised motif generation framework.
-
iii)
FP-GNN [44] integrated molecular graph features extracted by GATs with various molecular fingerprints to predict molecular properties.
-
iv)
D-MPNN [32] used messages associated with directed bonds rather than those with atoms to avoid unnecessary loops in the message passing.
-
v)
C-MPNN [33] introduced a communication MPNN to enhance message interactions between atoms and bonds.
-
vi)
GROVER [45] integrated message-passing networks into the transformer architecture, enhancing molecular representation learning with carefully designed self-supervised tasks in node-level, edge-level, and graph-level.
-
vii)
HiMol [37] introduced a pre-trained HimGNN framework to encode molecules, capturing motif structures and extracting multi-level molecular representations.
-
viii)
HimGNN [38] designed a transformer-based local augmentation module to learn hierarchical molecular representations by integrating atom and motif information.
-
ix)
Mix-Key [46] was a data augmentation method for molecular property prediction that effectively captures the interactions between molecular scaffolds and functional groups.
-
x)
DGCL [47] combined features extracted through a self-supervised contrastive learning method based on dual-GNNs with mixed molecular fingerprints to predict molecular properties.
3. Results and discussion
3.1. The prediction performance of FH-GNN and the comparison between our model and baselines
To comprehensively evaluate the effectiveness of the FH-GNN, we conducted molecular property prediction experiments on eight benchmark datasets from MoleculeNet. We used scaffold splitting to split the datasets into training, validation, and test sets in a ratio of 8:1:1. Scaffold splitting, which separates molecules based on their two-dimensional structures, poses a more challenging and realistic scenario than random splitting. For a fair comparison, we performed five independent runs using different random seeds and calculated the mean and standard deviation of the ROC-AUC and RMSE metrics. All the experiments were conducted on a Linux server with Nvidia GeForce RTX 3080 Ti (NVIDIA, Santa Clara, CA, USA) and Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40 GHz. The FH-GNN was trained for 100 epochs using the Adam optimizer. The selection of key hyperparameters for all datasets is presented in Table S1.
Table 3 [29,33,34,37,38,46,47] shows the results of FH-GNN on eight benchmark datasets. Our FH-GNN model achieved the best prediction performance on two datasets, BBBP and Tox21, and achieved suboptimal performance on the other datasets. In classification tasks, FH-GNN outperformed DGCL [47] on two datasets and achieved the best average performance across five datasets, with an average ROC AUC of 0.861, which was 3.11% higher than that of DGCL. DGCL performed particularly poorly on the BBBP dataset, with a 22.8% decrease in the ROC-AUC compared to FH-GNN. These results demonstrate that FH-GNN is both robust and generalizable, providing stable performance across a diverse range of datasets. In regression tasks, FH-GNN showed a slightly worse performance than Mix-Key [46], with an average RMSE of 0.968, which was 0.047 higher than that of Mix-Key. However, Mix-Key struggled with the classification tasks, with FH-GNN outperforming it on all five classification datasets. Its good prediction performance on the test datasets indicates that FH-GNN effectively captures molecular features from multiple perspectives and provides a comprehensive representation of molecules by integrating hierarchical molecular graphs and molecular fingerprints.
Table 3.
Molecular property prediction performance under scaffold splitting.
| Models | Classification (ROC-AUC) for each dataset ↑ |
Regression (RMSE) for each dataset ↓ |
Refs. | ||||||
|---|---|---|---|---|---|---|---|---|---|
| BACE | BBBP | Tox21 | SIDER | ClinTox | ESOL | FreeSolv | Lipophilicity | ||
| AttentiveFP | 0.784 (0.000) | 0.908 (0.050) | 0.807 (0.020) | 0.605 (0.060) | 0.933 (0.020) | 0.877 (0.029) | 2.073 (0.183) | 0.721 (0.001) | [29] |
| C-MPNN | 0.821 (0.006) | 0.927 (0.002) | 0.806 (0.016) | 0.616 (0.003) | 0.902 (0.012) | 0.845 (0.039) | 1.833 (0.580) | 0.658 (0.029) | [33] |
| MGSSL | 0.791 (0.009) | 0.697 (0.009) | 0.765 (0.003) | 0.618 (0.008) | 0.807 (0.021) | 1.346 | 2.980 | 0.751 | [34] |
| Himolbest | 0.846 (0.002) | 0.732 (0.008) | 0.762 (0.003) | 0.625 (0.003) | 0.808 (0.014) | 0.833 | 2.283 | 0.708 | [37] |
| FP-GNN | 0.845 (0.028) | 0.910 (0.027) | 0.803 (0.024) | 0.598 (0.014) | 0.765 (0.038) | 1.282 (0.332) | 2.492 (0.649) | 0.679 (0.152) | [38] |
| GROVER | 0.835 (0.044) | 0.911 (0.008) | 0.806 (0.017) | 0.621 (0.006) | 0.882 (0.013) | 0.921 (0.092) | 2.026 (0.127) | 0.646 (0.021) | [38] |
| D-MPNN | 0.823 (0.038) | 0.911 (0.048) | 0.808 (0.023) | 0.610 (0.027) | 0.879 (0.040) | 0.972 (0.097) | 2.170 (0.536) | 0.652 (0.051) | [38] |
| HimGNN | 0.856 (0.034) | 0.928 (0.027) | 0.807 (0.017) | 0.642 (0.023) | 0.917 (0.030) | 0.870 (0.154) | 1.921 (0.474) | 0.632 (0.016) | [38] |
| Mix-Keybest | 0.849 (0.021) | 0.845 (0.034) | 0.830 (0.002) | 0.640 (0.007) | 0.855 (0.022) | 0.755 (0.031) | 1.270 (0.052) | 0.737 (0.005) | [46] |
| DGCL | 0.915 (0.017) | 0.738 (0.006) | 0.772 (0.003) | 0.781 (0.022) | 0.971 (0.029) | 1.046 (0.050) | 2.080 (0.027) | 0.477 (0.031) | [47] |
| FH-GNN | 0.891 (0.008) | 0.956 (0.010) | 0.842 (0.002) | 0.665 (0.008) | 0.953 (0.007) | 0.785 (0.017) | 1.515 (0.086) | 0.604 (0.013) | |
↑ means that the higher result is better and ↓ means that the lower result is better. The best score in each column is in bold and the second best score is underlined. ROC-AUC: area under the receiver operating characteristic curve; RMSE: root mean square error. C-MPNN: contextual message-passing neural networks; MGSSL: motif-based graph self-supervised learning; HiMol: hierarchical molecular graph self-supervised learning; FP-GNN: fingerprints and graph neural networks (GNN); GROVER: graph representation from self-supervised message-passing transformer; D-MPNN: directed message-passing neural networks; HimGNN: hierarchical molecular GNNs; DGCL: dual-GNNs contrastive learning; FH-GNN: fingerprint-enhanced hierarchical GNN.
3.2. Ablation study
Ablation experiments were conducted during the training process to analyze the effectiveness of each part of the FH-GNN framework. Under the same experimental settings, we implemented the following five simplified variants of FH-GNN on eight benchmark datasets:
-
i)
FH-GNN_C: simple concatenation hierarchical molecular graph features and molecular fingerprint features.
-
ii)
FH-GNN_F: FH-GNN architecture without the molecular fingerprint.
-
iii)
FH-GNN_A: constructing molecular graphs based on atoms.
-
iv)
FH-GNN_M: constructing molecular graphs based on motif.
-
v)
FH-GNN_E: constructing hierarchical molecular graphs without atom-motif edges and motif-graph edges.
Fig. 3 illustrates the ablation experimental results of FH-GNN for both classification and regression tasks. The FH-GNN, which integrates hierarchical molecular graphs and molecular fingerprints, exhibits the best performance among almost all the architectures. These results demonstrate that each component of the FH-GNN framework complements the property prediction. First, FH-GNN outperformed or exhibited a performance comparable to that of FH-GNN_C, except for the FreeSolv dataset. The FH-GNN_C combines features in a static manner and treats all features as equally important. However, different feature types may contribute differently to the model. The adaptive attention mechanism allows the FH-GNN to learn which features (or parts of the features) are more important at each stage, thus improving the model's ability to extract and combine relevant information in a flexible and task-specific manner. Second, FH-GNN_A generally outperformed FH-GNN_M, except for the BACE dataset. Fine-grained atom features provide more detailed representations than motif features in molecular representation learning. However, the positional relationships between motifs play a more crucial role than atom features in determining the binding results for inhibitors of human β-secretase 1. This highlights the complementary nature of atom and motif features for property prediction. Third, the performance of FH-GNN_E degraded across all benchmarks, suggesting the importance of establishing connections between different hierarchical levels in molecular representation learning. Cross-layer connections allow for the exchange of information between atom, motif, and graph levels, ensuring that the model comprehensively considers information from different hierarchies. In summary, our ablation studies confirm that the integration of hierarchical molecular graph structures and domain knowledge into the FH-GNN framework significantly enhances the performance and robustness of the model in both classification and regression tasks.
Fig. 3.
The ablation experimental results for (A) classification and (B) regression tasks. ROC-AUC: area under the receiver operating characteristic curve; FH-GNN_C: simple concatenation fingerprint-enhanced hierarchical graph neural network (FH-GNN); FH-GNN_F: FH-GNN architecture without the molecular fingerprint; FH-GNN_A: FH-GNN based on atoms; FH-GNN_M: FH-GNN based on motif; FH-GNN_E: FH-GNN without atom-motif edges and motif-graph edges; RMSE: root mean square error.
In addition, we also explored two other fragmentation methods: one that breaks the outer ring bond (FH-GNN_R) and another that breaks both the outer ring bond and the bond associated with the BRICS algorithm (FH-GNN_BR). Fig. 4 shows the experimental results of different fragmentation methods for both classification and regression tasks. FH-GNN outperforms or has comparable performance compared to FH-GNN_R and FH-GNN_BR across all regression datasets and three classification datasets. Notably, FH-GNN_R and FH-GNN_BR outperform FH-GNN on specific datasets, achieving a ROC-AUC of 0.932 on the BACE dataset and 0.974 on the ClinTox dataset, respectively. However, when considering the average performance across all datasets, FH-GNN consistently outperforms both FH-GNN_R and FH-GNN_BR in both classification and regression tasks. These findings suggest that the performance of different fragmentation methods may vary depending on the dataset. In general, the fragmentation method based on the BRICS algorithm proves effective in capturing the local structural information of molecules, demonstrating stable performance across all datasets.
Fig. 4.
The experimental results of different fragmentation methods for (A) classification and (B) regression tasks. ROC-AUC: area under the receiver operating characteristic curve; FH-GNN_R: fingerprint-enhanced hierarchical graph neural network (FH-GNN) by breaking the outer ring bond. FH-GNN_BR: FH-GNN by breaking both the outer ring bond and the bond associated with the breaking of retrosynthetically interesting chemical substructure (BRICS) algorithm; RMSE: root mean square error.
3.3. Visualization analysis
To demonstrate the molecular representations learning ability of FH-GNN, we used t-distributed stochastic neighbor embedding (t-SNE) to visualize molecular representations. Fig. 5 illustrates the visualization of molecular representations before and after training on the BBBP and BACE datasets. Before training, the molecular representations exhibit a chaotic and overlapping spatial distribution, making classification impossible. After training with FH-GNN, molecules from different classes exhibit a clear separation trend, with those labeled 1 and 0 clustering in the top-left and bottom-right corners, respectively. In contrast, while HimGNN also shows some separation between classes after training, the majority of molecular representations still exhibit a chaotic and overlapping spatial distribution. The visualization results indicate that FH-GNN achieves better performance than HimGNN in molecular representation. FH-GNN accurately captures the relationship between molecular structures and their physicochemical properties, achieving a meaningful distinction between different classes of molecules. FH-GNN achieves comprehensive molecular representations by integrating information from various molecular levels and domain knowledge.
Fig. 5.
t-Distributed stochastic neighbor embedding (t-SNE) visualization of molecular representations for (A) the BBBP dataset and (B) the BACE dataset. The left column shows the initial molecular representations, while the middle and right columns show the representations after training with fingerprint-enhanced hierarchical graph neural network (FH-GNN) and hierarchical molecular GNN (HimGNN), respectively. Orange dots denote positive labels, and blue dots denote negative labels.
3.4. Case study
In order to investigate the relationship between molecular structures and attention weights in different models, we conducted interpretability case studies on two molecules selected from the BBBP dataset. The BBBP dataset provides information on the BBB permeability of various compounds. Accurately predicting the BBB permeability of molecules is crucial for developing central nervous system drugs and avoiding related side effects or toxicity. Fig. 6 illustrates the roles of specific molecular components in influencing BBB permeability in different models. FH-GNN_A represents the construction of molecular graphs based on atoms, FH-GNN_M represents the construction of molecular graphs based on motifs, and FH-GNN represents the construction of molecular graphs based on hierarchical molecules. Blue indicates atoms that facilitate the molecule's ability to cross the BBB, while red indicates atoms that hinder this ability. The portions of the molecule with deeper colors are more significant in predicting whether the molecule can cross the BBB. Note that since the molecular fingerprint encoding module cannot obtain representation at the atomic level, our experiments were conducted based on the molecular graph representation module.
Fig. 6.
Interpretability analysis on the BBBP dataset for (A) a molecule that can cross the blood-brain barrier (BBB) and (B) a molecule that cannot in different models. Blue indicates atoms that facilitate the molecule's ability to cross the BBB, while red indicates atoms that hinder this ability. The portions of the molecule with deeper colors are more significant in predicting whether the molecule can cross the BBB. FH-GNN_A: fingerprint-enhanced hierarchical graph neural network (FH-GNN) based on atoms; FH-GNN_M: FH-GNN based on motif.
Considering an example that can cross the BBB (Fig. 6A), the hydrophobic nitroimidazole scaffold is the primary contributor, followed by the chlorine atom. The oxygen atoms in the nitro group limit the molecule's ability to cross the barrier, but their effect is limited. In contrast, for a molecule that cannot cross the BBB (Fig. 6B), hydrophilic groups such as carboxyl, ether, and piperazine rings play a major role in hindering the molecule from crossing the BBB. These results align with our understanding that hydrophobic molecules generally penetrate the BBB more easily, while hydrophilic molecules exhibit the opposite effect. In different models, the roles of most atoms are consistent, with only a few atoms showing conflicting roles. For example, the hydroxyl group in molecules that can cross the BBB hinders permeability in FH-GNN_A but facilitates it in FH-GNN_M. FH-GNN effectively integrates information from both atoms and motifs, making the attention weights in the model more closely aligned with the molecular structures. These findings provide valuable insights into the relationship between molecular structure and properties. The interpretability provided by the model's attention weights can help researchers optimize molecules for better BBB permeability, ultimately guiding the development of drugs targeting central nervous system diseases.
4. Conclusions
In this study, we propose a novel FH-GNN for molecular property prediction. Hierarchical molecular graphs integrate atomic-, motif-, and graph-level information, providing rich structural features. Molecular fingerprint features complement hierarchical graph representations by providing powerful prior knowledge. The FH-GNN integrates information from both hierarchical molecular graphs and molecular fingerprints to generate comprehensive molecular representations, thereby enhancing prediction accuracy. Extensive experiments demonstrated that our FH-GNN performed better than baseline models for molecular property prediction, highlighting its ability to capture detailed molecular information. Ablation, visualization, and case studies confirmed the effectiveness of using hierarchical molecular graph information and molecular fingerprints. However, despite the advantages of hierarchical molecular graphs in simplifying complex molecular structures, challenges such as computational resources, model complexity, and data quality remain when dealing with very large datasets and highly complex molecular structures. Future work will focus on optimizing the scalability of FH-GNN and exploring their applications in a broader range of drug discovery tasks.
CRediT authorship contribution statement
Shuo Liu: Writing – original draft, Visualization, Validation, Methodology, Investigation, Data curation, Conceptualization. Mengyun Chen: Writing – review & editing. Xiaojun Yao: Writing – review & editing, Conceptualization. Huanxiang Liu: Writing – review & editing, Project administration, Conceptualization.
Data and code available
All data used in this paper are publicly available and can be accessed at MoleculeNet website https://github.com/deepchem/deepchem/tree/master/deepchem/molnet/load_function. All codes of FH-GNN are available at https://github.com/shuoliu0-0/FH-GNN.
Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: the author Shuo Liu were temporarily interned at Huawei Technologies Co., Ltd. and received guidance from Huawei employee Menyun Chen during this research.
Acknowledgments
This work was supported by Macao Science and Technology Development Fund, Macao SAR, China (Grant No.: 0043/2023/AFJ), the National Natural Science Foundation of China (Grant No.: 22173038), and Macao Polytechnic University, Macao SAR, China (Grant No.: RP/FCA-01/2022). We also thank the Supercomputing Center of Lanzhou University (China) for providing high-performance computing resources.
Footnotes
Peer review under responsibility of Xi'an Jiaotong University.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jpha.2025.101242.
Contributor Information
Xiaojun Yao, Email: xjyao@mpu.edu.mo.
Huanxiang Liu, Email: hxliu@mpu.edu.mo.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
References
- 1.Jiang J., Chen L., Ke L., et al. A review of transformers in drug discovery and beyond. J. Pharm. Anal. 2024;15:101081. doi: 10.1016/j.jpha.2024.101081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wu T., Lin R., Cui P., et al. Deep learning-based drug screening for the discovery of potential therapeutic agents for Alzheimer’s disease. J. Pharm. Anal. 2024;14 doi: 10.1016/j.jpha.2024.101022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Deng J., Yang Z., Wang H., et al. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 2023;14 doi: 10.1038/s41467-023-41948-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fang X., Liu L., Lei J., et al. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 2022;4:127–134. [Google Scholar]
- 5.Chen D., Gao K., Nguyen D.D., et al. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nat. Commun. 2021;12 doi: 10.1038/s41467-021-23720-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Torres L.H.M., Ribeiro B., Arrais J.P. Multi-scale cross-attention transformer via graph embeddings for few-shot molecular property prediction. Appl. Soft Comput. 2024;153 [Google Scholar]
- 7.Hughes J.P., Rees S., Kalindjian S.B., et al. Principles of early drug discovery. Br. J. Pharmacol. 2011;162:1239–1249. doi: 10.1111/j.1476-5381.2010.01127.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fu L., Shi S., Yi J., et al. ADMETlab 3.0: An updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res. 2024;52:W422–W431. doi: 10.1093/nar/gkae236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Teng S., Yin C., Wang Y., et al. MolFPG: Multi-level fingerprint-based Graph Transformer for accurate and robust drug toxicity prediction. Comput. Biol. Med. 2023;164 doi: 10.1016/j.compbiomed.2023.106904. [DOI] [PubMed] [Google Scholar]
- 10.Li Z., Jiang M., Wang S., et al. Deep learning methods for molecular representation and property prediction. Drug Discov. Today. 2022;27 doi: 10.1016/j.drudis.2022.103373. [DOI] [PubMed] [Google Scholar]
- 11.Kim H., Lee J., Ahn S., et al. A merged molecular representation learning for molecular properties prediction with a web-based service. Sci. Rep. 2021;11 doi: 10.1038/s41598-021-90259-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Huang Z., Yu J., He W., et al. AI-enhanced chemical paradigm: From molecular graphs to accurate prediction and mechanism. J. Hazard. Mater. 2024;465 doi: 10.1016/j.jhazmat.2023.133355. [DOI] [PubMed] [Google Scholar]
- 13.Moriwaki H., Tian Y., Kawashita N., et al. Mordred: A molecular descriptor calculator. J. Cheminform. 2018;10:4. doi: 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Capecchi A., Probst D., Reymond J.L. One molecular fingerprint to rule them all: Drugs, biomolecules, and the metabolome. J. Cheminform. 2020;12 doi: 10.1186/s13321-020-00445-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Anderson C.J., Cadeddu R., Anderson D.N., et al. A novel naïve Bayes approach to identifying grooming behaviors in the force-plate actometric platform. J. Neurosci. Methods. 2024;403 doi: 10.1016/j.jneumeth.2023.110026. [DOI] [PubMed] [Google Scholar]
- 16.Zhang Y., Wang Y., Gu Z., et al. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front. Med. (Lausanne) 2023;10 doi: 10.3389/fmed.2023.1052923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Chen T., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 13−17. 2016. XGBoost: A scalable tree boosting system. 2016, San Francisco, USA, pp. 785–794. [Google Scholar]
- 18.Yang X., Wang Y., Byrne R., et al. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 2019;119:10520–10594. doi: 10.1021/acs.chemrev.8b00728. [DOI] [PubMed] [Google Scholar]
- 19.Fralish Z., Chen A., Skaluba P., et al. DeepDelta: Predicting ADMET improvements of molecular derivatives with deep learning. J. Cheminform. 2023;15 doi: 10.1186/s13321-023-00769-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liu H., Huang Y., Liu X., et al. Attention-wise masked graph contrastive learning for predicting molecular property. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbac303. [DOI] [PubMed] [Google Scholar]
- 21.Tang B., Kramer S.T., Fang M., et al. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J. Cheminform. 2020;12 doi: 10.1186/s13321-020-0414-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xu F., Yang Z., Wang L., et al. MESPool: Molecular edge shrinkage pooling for hierarchical molecular representation learning and property prediction. Brief. Bioinform. 2023;25 doi: 10.1093/bib/bbad423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cai L., He Y., Fu X., et al. AEGNN-M: A 3D graph-spatial co-representation model for molecular property prediction. IEEE J. Biomed. Health Inform. 2025;29:1726–1734. doi: 10.1109/JBHI.2024.3368608. [DOI] [PubMed] [Google Scholar]
- 24.Feinberg E.N., Sur D., Wu Z., et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 2018;4:1520–1530. doi: 10.1021/acscentsci.8b00507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kearnes S., McCloskey K., Berndl M., et al. Molecular graph convolutions: Moving beyond fingerprints. J. Comput. Aided Mol. Des. 2016;30:595–608. doi: 10.1007/s10822-016-9938-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sun M., Zhao S., Gilvary C., et al. Graph convolutional networks for computational drug development and discovery. Brief. Bioinform. 2020;21:919–935. doi: 10.1093/bib/bbz042. [DOI] [PubMed] [Google Scholar]
- 27.Jiang S., Balaprakash P. 2020 IEEE International Conference on Big Data (Big Data). December 10−13. 2020. Graph neural network architecture search for molecular property prediction. Atlanta, USA. IEEE, pp. 1346–1353. [Google Scholar]
- 28.Zhang Z., Guan J., Zhou S. FraGAT: A fragment-oriented multi-scale graph attention model for molecular property prediction. Bioinformatics. 2021;37:2981–2987. doi: 10.1093/bioinformatics/btab195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Xiong Z., Wang D., Liu X., et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 2020;63:8749–8760. doi: 10.1021/acs.jmedchem.9b00959. [DOI] [PubMed] [Google Scholar]
- 30.Gilmer J., Schoenholz S.S., Riley P.F., et al. Proceedings of the 34th International Conference on Machine Learning, Volume 70, August 6−11. 2017. Neural message passing for quantum chemistry. 2017, Sydney, Australia, pp. 1263–1272. [Google Scholar]
- 31.Withnall M., Lindelöf E., Engkvist O., et al. Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J. Cheminform. 2020;12 doi: 10.1186/s13321-019-0407-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yang K., Swanson K., Jin W., et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019;59:3370–3388. doi: 10.1021/acs.jcim.9b00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Song Y., Zheng S., Niu Z., et al. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. January 7−15. 2021. Communicative representation learning on attributed molecular graphs. 2021, Yokohama, Japan, pp. 2831–2838. [Google Scholar]
- 34.Z. Zhang, Q. Liu, H. Wang, et al., Motif-based graph self-supervised learning for molecular property prediction, arXiv. 2021. 10.48550/arXiv.2110.00987. [DOI]
- 35.Wang H., Li W., Jin X., et al. 10th International Conference on Learning Representations, Apr 25−29, 2022, Virtual, Online. 2022. Chemical-Reaction-Aware molecule representation learning; pp. 1–18. [Google Scholar]
- 36.Zhang S., Hu Z., Subramonian A., et al. Motif-driven contrastive learning of graph representations. IEEE Trans. Knowl. Data Eng. 2024;36:4063–4075. [Google Scholar]
- 37.Zang X., Zhao X., Tang B. Hierarchical molecular graph self-supervised learning for property prediction. Commun. Chem. 2023;6 doi: 10.1038/s42004-023-00825-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Han S., Fu H., Wu Y., et al. HimGNN: A novel hierarchical molecular graph representation learning framework for property prediction. Brief. Bioinform. 2023;24 doi: 10.1093/bib/bbad305. [DOI] [PubMed] [Google Scholar]
- 39.Rifaioglu A.S., Nalbat E., Atalay V., et al. DEEPScreen: High performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem. Sci. 2020;11:2531–2557. doi: 10.1039/c9sc03414e. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wu Z., Ramsundar B., Feinberg E.N., et al. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2017;9:513–530. doi: 10.1039/c7sc02664a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mayr A., Klambauer G., Unterthiner T., et al. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 2018;9:5441–5451. doi: 10.1039/c8sc00148k. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jiang D., Wu Z., Hsieh C.Y., et al. Could graph neural networks learn better molecular representation for drug discovery A comparison study of descriptor-based and graph-based models. J. Cheminform. 2021;13 doi: 10.1186/s13321-020-00479-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.X. Zhang, X. Sun, Y. Luo, et al., RSTNet: Captioning with adaptive attention on visual and non-visual words, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20−25, 2021, Nashville, USA, pp. 15460−15469.
- 44.Cai H., Zhang H., Zhao D., et al. FP-GNN: A versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbac408. [DOI] [PubMed] [Google Scholar]
- 45.Y. Rong, Y. Bian, T. Xu, et al., Self-supervised graph transformer on large-scale molecular data, arXiv. 2020. 10.48550/arXiv.2007.02835. [DOI]
- 46.Jiang T., Wang Z., Yu W., et al. Mix-Key: Graph mixup with key structures for molecular property prediction. Brief. Bioinform. 2024;25 doi: 10.1093/bib/bbae165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Jiang X., Tan L., Zou Q. DGCL: Dual-graph neural networks contrastive learning for molecular property prediction. Brief. Bioinform. 2024;25 doi: 10.1093/bib/bbae474. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







