Skip to main content
ACS Omega logoLink to ACS Omega
. 2025 Jul 11;10(28):30232–30249. doi: 10.1021/acsomega.5c01443

MCST-AFN: A Multichannel Spatiotemporal Feature Adaptive Fusion Network Framework Based on a Low-Fidelity Molecular Dynamics Model

Xing Chen , Weichen Liu §, Tiantian Ruan , Jinsong Shao , Sanjeevi Pandiyan , Min Yao ∥,*, Li Wang ‡,*
PMCID: PMC12290669  PMID: 40727795

Abstract

The capability of predicting molecular properties plays a crucial role in drug development, and the learning of molecular representations stands as the primary step in tasks aimed at predicting molecular properties. Static three-dimensional (3D) structural information has been shown to significantly aid in molecular representation; however, molecules are in constant motion and change, implying that their properties should be closely linked with dynamic molecular conformations. Traditional four-dimensional (4D) Quantitative Structure–Property Relationship (QSPR) methods, while incorporating time as a dimension, have high computational costs and fail to fully integrate the temporal dimension, leading to ineffective integration of molecular conformation ensembles. Inspired by deep learning-based molecular dynamics (DLMD) techniques and multifidelity learning (MFL) strategies, in this work, a multichannel spatiotemporal feature adaptive fusion network framework (MCST-AFN) based on a low-fidelity molecular dynamics model is proposed. This framework integrates deep learning technology with molecular dynamics (MD) simulations, effectively enhancing molecular representation while significantly reducing computational costs. Initially, a low-fidelity molecular dynamics simulation model is trained using real molecular dynamics simulation data. Compared to existing tools such as Amber, this low-fidelity model can update atomic coordinates at a lower computational cost and output multichannel atom-level embeddings that encapsulate information across different time scales. Subsequently, an attention-based network is constructed to achieve adaptive fusion of multichannel spatiotemporal features, and a self-supervised learning task for atom masking prediction is designed to further enhance molecular representation. The MCST-AFN was tested on 13 benchmark data sets for molecular property prediction, achieving an average performance improvement of 2.10% across 12 data sets. The most significant enhancement was seen in the ESOL data set, with a performance boost of 19.70%.


graphic file with name ao5c01443_0015.jpg


graphic file with name ao5c01443_0013.jpg

Introduction

Molecular properties are crucial factors in the fields of chemistry and drug discovery. Computer-aided methods can rapidly predict molecular properties, providing an overview of the molecules under study before specific experiments commence. These methods are termed Quantitative Structure–Activity Relationship (QSAR) or Quantitative Structure–Property Relationship (QSPR) models. Furthermore, with the advancement of machine learning techniques, the accuracy and speed of molecular property prediction have been enhanced. For instance, graph convolutional neural networks, convolutional neural networks, and recurrent neural networks have become popular in drug discovery and molecular analysis. Generative adversarial networks combined with certain machine learning strategies, such as supervised learning and reinforcement learning, have also been applied to the generation of new molecules and drug design. , Contrastive learning, a self-supervised learning strategy often used for handling data sets without labels, has gained considerable popularity. However, deep learning approaches require large data sets to determine their numerous weights and may not be competitive when applied to small data sets.

Consequently, researchers began to seek directions for the initial molecular descriptors or molecular representations. Fundamentally, the primary objective of molecular representation methods is to establish a mapping model between the molecular structure and molecular properties. A typical example of a molecular descriptor is the one-dimensional sequence form of SMILES (Simplified Molecular Input Line Entry System). In SMILES, atoms and chemical bonds are represented by letters and punctuation marks, respectively, while branches are described using parentheses. Molecular fingerprints generally contain information about their molecular structure. Two-dimensional (2D) fingerprints mainly consist of four types: substructure key-based fingerprints, topology or path-based fingerprints, circular fingerprints, and pharmacophore fingerprints. However, 2D fingerprints tend to lose 3D structural information, particularly stereochemical descriptions.

To address the issue of 3D structural information loss in 2D fingerprints, 3D structure-based algebraic graph fingerprints have been developed to capture the 3D patterns of molecules. Chen et al. utilized algebraic graphs to assist bidirectional transformers in generating final molecular representations, which led to significant improvements in molecular property prediction tasks compared to other benchmark models; however, the 3D structural information they used was typically static. Biological research findings indicate that atoms and bonds within molecules are constantly in motion, suggesting that molecular-level analysis should also relate to dynamic 3D molecular conformations, especially when molecular activity and pharmacokinetic characteristics.

Molecular dynamics simulations are typically responsible for describing this dynamic process, capturing the time-dependent behavior of molecular systems, which can provide insights into their physicochemical properties and structural transformations. Building upon the static 3D structures of molecules, incorporating the time dimension, i.e., the kinetic features of molecules as they change over time is referred to as the 4D-QSAR (Quantitative Structure–Activity Relationship) approach. This method was initially proposed by Hopfinger and colleagues in 1997, through the execution of an ensemble average of molecular conformational behavior, integrating conformation and alignment into the development of 3D-QSAR models. The fourth dimension involves sampling the spatial characteristics of the conformational ensemble. In 2009, Martins et al. also introduced a new 4D-QSAR method named LQTA-QSAR, based on generating a conformational ensemble profile (CEP) for each compound, not just a single conformation, followed by the calculation of 3D descriptors for a set of compounds. Although the 4D-QSAR method has been proposed early on and has garnered growing interest, the number of publications reporting on it has seen a slowdown over the past decade. A possible reason for the waning interest could be the extremely high computational power required to accurately obtain the conformational behavior of molecular compounds, with breakthrough results being scarce relative to the substantial costs involved.

Molecular dynamics has long been the preferred method for simulating complex atomic systems capable of performing precise calculations from first principles. However, directly obtaining spatiotemporal features of molecules via molecular dynamics requires enormous computational resources. A data set might contain tens of thousands of molecules, and conducting molecular dynamics simulations for each molecule individually would necessitate tens of thousands of CPU hours, potentially taking months even on high-performance computing clusters. Recently, deep learning models have emerged as a popular approach to accelerate molecular dynamics, offering a new paradigm for reducing computational costs. Researchers have attempted to train neural networks with quantitative molecular dynamic trajectory data, aiming to enable these networks to learn the rules governing conformational changes and to automatically generate additional trajectories of molecular movements. The accuracy of these models stems not only from the unique capability of neural networks to approximate high-dimensional functions but also from the appropriate handling of physical requirements, such as symmetry constraints. Wu et al. proposed the DIFFMD method, which relies on a score-based denoising diffusion generative model. This method uses conditionally noise-perturbed molecular structures based on atomic accelerations and treats the conformations from previous time frames as prior distributions for sampling, thereby achieving the generation of conformations for the next time frame.

Another challenge lies in the inadequate representation learning capabilities for ensembles of molecular conformations. Axelrod et al. tested the classification abilities of some 2D, 3D, and 4D models on a library of 278,758 bioactive samples from drug-like molecules, where each molecule included a large number of conformations generated using the CREST program to construct models encoding ensembles containing up to 200 conformations. In this task, the 4D models that explicitly encoded multiple conformations performed worse than the 3D models that encoded a single conformation, with the former incurring significantly higher computational costs. However, these ensembles of conformations are unstructured in arrangement; despite containing rich structural information, current models fail to capture conformational behavior information and genuinely incorporate temporal information.

In the field of protein macromolecules, Wu et al. introduced ProtMD, the first pretrained model based on the encoding of dynamic information on proteins. Despite performing molecular dynamics simulations on only 64 protein–ligand pairs, ProtMD was tested across more than 3000 proteins and achieved significant success in predicting drug–protein affinity and ligand efficacy, demonstrating its strong generalization ability. This successful initiative provides insights for the prediction of the properties of small molecular compounds. On one hand, ProtMD can simultaneously transform features and three-dimensional coordinates, truly incorporating temporal information into molecular encoding; on the other hand, training the ProtMD encoder with a limited amount of molecular dynamics trajectory data allows it to generalize to a larger number of unknown molecular structures. However, training a ProtMD encoder at a time scale of 50 ps may introduce significant bias, as it is uncertain whether a given time step will be useful for molecular property prediction tasks. To address the potential bias introduced by a single ProtMD encoder, it is considered to train multiple encoders at different time scales and specifically design a multichannel feature learning network.

The ProtMD model, trained on a limited amount of molecular dynamics simulation data, cannot match the precision of existing tools (such as Amber , ). However, when dealing with large-scale data sets comprising hundreds of thousands of molecules, employing such a low-fidelity (referring to sacrificing the accuracy of predicting molecular physical structures for computational efficiency) method for updating atomic coordinates can significantly reduce the consumption of computational resources. Drawing on multifidelity learning strategies, sacrificing a certain degree of accuracy for reasonable cost-effectiveness may represent a promising avenue for enhancing molecular representation. The core of multifidelity learning strategies lies in combining a large volume of low-fidelity data with a small amount of high-fidelity data to achieve a balance between cost and performance. Studies by Buterez et al. have shown that using low-fidelity measurements as an economical substitute can improve the accuracy of molecular property predictions based on sparse and expensive high-fidelity data. This research is dedicated to developing a low-fidelity molecular dynamics simulation model that not only efficiently updates atomic coordinates but also synchronously transforms atom-level embedding features, thus, significantly enhancing molecular representation. Through this approach, the aim is to address computational efficiency issues in processing large-scale molecular data sets while maintaining sufficient predictive accuracy, providing an effective solution for molecular property prediction.

In this work, the ProtMD encoder serves as the backbone network for feature extraction. Although ProtMD was initially proposed to address challenges in the protein domain, our experimental findings reveal that after adjusting its structural parameters, it can function effectively as a molecular encoder. Molecular dynamics trajectory data sampled at different time steps are used to train a limited number of ProtMD instances with varying timespans. Assuming that the trained encoder ensemble already possesses good generalization capabilities, they are then encapsulated in the N-ProtMD tool. The N-ProtMD tool is a deep learning model for low-fidelity molecular dynamics simulations. Upon receiving an initial molecular object as input, it can automatically generate a collection of 3D molecular coordinate sets with time-dependent relationships and their associated transformed hidden atom-level embeddings. It is important to note that features output by individual ProtMD components are considered to carry time information in a vague sense, whereas the N-ProtMD tool encapsulates multiple ProtMD components carrying information from different time scales. To balance the biases introduced during training and generalization and to fully leverage the vague temporal information carried by low-fidelity models, an attention mechanism-based network is designed to integrate multichannel spatiotemporal features. Additionally, an atom masking prediction self-supervised learning task is designed to further enhance molecular representation, with the learned molecular representations subsequently applied to downstream tasks.

The summary of this paper is as follows: (1) Compared with traditional 4D-QSA­(P)­R methods, this paper proposes a multichannel spatiotemporal feature adaptive fusion network framework based on a low-fidelity molecular dynamics model, combined with deep learning technology and molecular dynamics simulations, abbreviated as MCST-AFN. This framework effectively enhances molecular representation while significantly reducing computational costs. (2) A low-fidelity molecular dynamics simulation model named N-ProtMD has been developed, utilizing multiple ProtMD encoders carrying information from different time scales to capture the diversity and complexity in molecular dynamics trajectories, thereby overcoming the potential bias introduced by a time scale encoder to some extent. (3) An adaptive fusion network based on an attention mechanism has been designed, essentially a Convolutional Block Attention Residual Network (CBAR-Net), capable of effectively integrating the vague spatiotemporal information output by encoders operating at different time scales and balancing the biases introduced during training and generalization. (4) Across multiple data sets for molecular property prediction, MCST-AFN has demonstrated significant superiority compared to other advanced methods.

Materials and Methods

Figure illustrates the design of the MCST-AFN framework, which consists of three stages: (1) Future Conformation Prediction Pretraining Task; (2) Atom Masking Prediction Pretraining Task; (3) Downstream Task Fine-tuning Prediction. The key idea behind MCST-AFN is to use the trained N-ProtMD tool (see Figure ) to output multichannel atom-level embeddings carrying information from different time scales, which are then adaptively fused through a Convolutional Block Attention Residual Network (see Figure ) to enhance molecular representation. Among these, the Convolutional Block Attention Module (CBAM) is a module that integrates channel and spatial attention.

1.

1

Overview of the pretraining and fine-tuning of MCST-AFN. Stage 1: Future Conformation Prediction Pretraining Task, where the core component ProtMD i of the N-ProtMD tool is trained using real molecular dynamics simulation data to predict future conformations. Stage 2: Atom Masking Prediction Pretraining Task, involving masking part of the atoms in a molecule and feeding it into the MCST-AFN to predict the type of the masked atoms. Stage 3: Downstream Task Prediction, where the initial molecular object is input, and the N-ProtMD tool outputs a collection of 3D molecular coordinate sets at multiple future time intervals along with jointly transformed atom-level embedding features. These embedding features are then processed through a CBAR-Net for the molecular property prediction.

2.

2

Framework of the N-ProtMD tool. The input is the current conformation at a given time, and the output is a set of three-dimensional coordinates representing multiple frames at different future time spans, along with their jointly transformed hidden atomic-level embedding features.

3.

3

Framework of the Convolutional Block Attention Residual Network. The input is the initial feature set of the molecule, which passes through a three-layer network module comprising convolutional block attention, regularization, and linear layer residual connections. Finally, the dimensionality is reduced via averaging to obtain fused features.

Future Conformation Prediction Task

Stage 1 in Figure illustrates the workflow of the future conformation prediction task. To obtain spatiotemporal information from molecular dynamics trajectories and to enhance the generalization capability of MCST-AFN, pretrained ProtMD components are utilized with processed molecular dynamics simulation data. This pretraining approach, referred to as atomic-level future conformation prediction, is specifically designed to capture the local context information on each atom.

The number of atoms in a molecular compound is denoted as M. To construct the spatiotemporal sequence of small molecular compounds, each molecular dynamics trajectory obtained is defined to have T time steps. At each time step t ∈ [T], the molecular graph G (t) = (V (t),E (t)) is represented with atoms as nodes and bonds as edges, where atoms have their respective 3D coordinates x(t)RM×3 and initial ψ h -dimensional rotation-invariant features h(t)RM×ψh , such as atom types. Consequently, the molecular spatiotemporal sequence can be defined as {G(t)}t=1T .

The conformation of the current time frame is denoted as G (t). In the future conformation prediction task, the target is to predict the spatial positions of the atoms in the next time frame, denoted as G (t+1). The ProtMD component is employed to accomplish this task. The objective is to maximize the likelihood of achieving this prediction.

L=logP((x(t+1))|{G(i)}i=1t;θ) 1

In this process, the conditional probability P is modeled through the encoder f θ, where θ represents the trainable weight parameters of the encoder. The objective is to achieve a high-performance N-ProtMD tool, as shown in Figure , that can generate a series of future consecutive time frame conformations {G(t+i)}i=1N based on the molecular conformation of the current time frame G (t). To accomplish this goal, N-ProtMD components were trained separately by uniformly sampling data from the molecular spatiotemporal sequence {G(t+i)}i=1N at time intervals of 1 to N.

Atomic Masking Prediction Task

Stage 2 in Figure demonstrates the workflow of the atomic masking prediction task. To ensure that MCST-AFN achieves very high performance on the target task, a pretraining method inspired by Zhang et al. is adopted. This involves masking the types of some atoms within a molecule and then feeding the masked data into the MCST-AFN model to predict the types of masked atoms. For molecules consisting of only a few atoms, it is ensured that at least one atom is masked. Specifically, 15% of the basic atom types in a molecule are randomly replaced with a [MASK] token, and the modified molecule is then fed into the MCST-AFN model to derive the final features of the molecule after the replacement. An additional forward network layer is added to specialize in predicting the types of masked atoms. The training objective is to enable MCST-AFN to predict the types of masked atoms with the highest possible accuracy.

Downstream Task Prediction

Stage 3 in Figure illustrates the fine-tuning prediction workflow of MCST-AFN for downstream tasks. The pretrained N-ProtMD tool serves as a feature extractor with fixed parameters. When an initial molecular conformation is input, this tool outputs a set of three-dimensional coordinates of the molecule at multiple frames across different time spans, along with joint-transformed hidden atomic-level embedding features. These feature sets are then fed into a convolutional block attention residual network for adaptive fusion, as shown in Figure . Finally, the output is passed through a fully connected layer for the molecular property prediction.

Input Representation

To better distinguish and extract features of different types of atoms within molecules, all elements from the periodic table must be included in the dictionary. However, the types of atoms appearing in molecules are far smaller than the total number of elements in the periodic table. Therefore, based on statistical analysis, the dictionary strategy incorporates the 10 most fundamental elements, while other rarely encountered atom types are represented by the [UNK] token. Figure presents the statistical ratio of molecules in the benchmark data set containing only the 10 fundamental atom types. Additionally, in the pretraining task of atomic masking prediction, the [MASK] token is used to represent the masked atoms. To ensure that the number of atoms in each batch of molecules remains consistent during training, padding operations are performed on molecules with insufficient numbers of atoms using the [PAD] token to denote the padded virtual atoms. Thus, the dictionary includes the following tokens: [H], [C], [N], [O], [F], [S], [P], [Br], [I], [Cl], [UNK], [MASK], and [PAD].

4.

4

Statistical ratio of molecules in the benchmark data set containing only up to 10 fundamental atom types. “Comply” indicates the percentage of molecules that contain only up to 10 fundamental atom types out of the total number of molecules in the data set. “Exceed” represents the percentage of molecules that contain more than 10 fundamental atom types relative to the total number of molecules in the data set.

Furthermore, since MCST-AFN is constructed based on the three-dimensional spatial information on molecules, it not only accepts atomic types as input but also utilizes the three-dimensional spatial coordinates of atoms as input.

N-ProtMD Tool

The ProtMD model employs a prompt tuning approach to capture the temporal dependencies across different time spans. For the sake of clarity, the ProtMD model that outputs the i-th frame conformation is denoted as ProtMD i . By inputting G (t) into the ProtMD model and concatenating the encoding vector hpromptRψprompt of the time span prompt with the initial atomic features h (t), the model can exhibit strong performance in predicting new conformations after a certain time period. Specifically, for the ProtMD i model, let us use prompt = i and encode it to obtain the corresponding h prompt . Then, the reaction mixture was concatenated with h (t). Lastly, pass the concatenated features through network layers with E­(n) Equivariant Graph Neural Network (EGNN) as the backbone and incorporate global attention mechanism. As a result, the atomic spatial coordinates x (t+i) and their latent features h (t+i) for i-th frame conformation G (t+i) can be obtained.

h(t+i),x(t+i)=ProtMDi(h(t),x(t),hprompt) 2

N trained ProtMD instances, encoding hidden states at different time spans, are encapsulated in the N-ProtMD tool, which serves as a feature extractor. During the fine-tuning process for downstream tasks, the parameters of this tool are kept fixed. When given the initial atomic features h (t) and coordinates x (t) as input, the N-ProtMD tool is capable of generating the atomic coordinates and features set {(x (t+i), h (t+i))} i = 1 , where h(t+i)RM×ψh . To integrate these features, each element in {h (t+i)} i = 1 is concatenated along a new dimension to obtain the initial molecular features h4DRN×M×ψh .

Convolutional Block Attention Residual Network

After obtaining the spatiotemporal features h4DRN×M×ψh of the molecule, the convolutional block attention can derive one-dimensional channel attention weights McRN×1×1 and two-dimensional spatial attention weights MsR1×H×W . Consequently, the process of feature refinement through the convolutional block attention can be summarized as follows

h4D=Mc(h4D)h4D 3
h4D=MS(h4D)h4D 4

In the equation, ⊗ denotes element-wise multiplication. To ensure that features do not degrade during propagation, residual connections are employed, yielding

h4D(1)=h4Dh4D 5

In the equation, ⊕ denotes element-wise addition. Subsequently, layer normalization is applied followed by a feed-forward layer, and once again, residual connections are utilized. The process can be summarized as follows

h4D(2)=FeedForward(LN(h4D(1))) 6
h4D(3)=LN(h4D(1)h4D(2)) 7

In the equation, FeedForward represents the feed-forward layer, and LN denotes the layer normalization operation. The feed-forward layer can capture more complex feature representations through the combination of multiple neurons. The aforementioned feature propagation process is repeated three times, meaning that h 4D must pass through a residual connection network built on three layers of convolutional block attention. Finally, dimensionality reduction is performed by averaging to obtain the final molecular features h4DendRM×ψh , which can be used for molecular property prediction through a fully connected layer. It is worth noting that during the fine-tuning process for downstream tasks, the parameters of the convolutional block attention residual network are not fixed and participate in the parameter update process.

Results and Discussion

Application Data Sets

Data Set for the Future Conformation Prediction Task

For the pretraining task of future conformation prediction, 50 molecular compounds were carefully selected from the DrugBank based on the size of their molecular weights and the diversity of atoms they contain. Efforts were made to ensure a balanced distribution of molecular weights while selecting compounds that are rich in a greater variety of atom types. The final selection of molecules included ten types of atoms: H, C, N, O, F, S, P, Br, I, and Cl. Subsequently, these molecules were processed using the Amber tool to generate dynamic trajectory data for each of the 50 molecules. After processing, the data were processed to retain 1,000 snapshots per molecule, resulting in a total of 50,000 snapshots. Finally, further sampling was conducted on the 50,000 snapshots to train the ProtMD encoder with different time spans. For instance, with a time span of five, the first snapshot was taken as the input data and the corresponding label sample was the sixth snapshot; when the 996th snapshot was reached, there was no corresponding label sample, thus ending the process. Consequently, 995 samples from one molecule were actually utilized. The experimental procedure for molecular dynamics simulations is detailed in Appendix B.

In the experiment, 80% of the data were used for training, while 20% were reserved for evaluation.

Data Set for the Atomic Masking Prediction Task

To enhance the generalizability of the MCST-AFN method, 250,000 molecules sampled from ZINC15 were chosen for the pretraining task of atomic masking prediction. The Estimated Torsion-driven Kinetic Monte Carlo (ETKDG) algorithm within RDKit was utilized to obtain simulated three-dimensional coordinates of atoms within the molecules, followed by energy minimization using the Merck Molecular Force Field to optimize the molecular geometry, ensuring it reached a stable state of minimum energy. Among the 250,000 molecules, some could not generate 3D conformations through RDKit, for which planar 3D conformations (with zero z-axis coordinates) directly derived from the molecular graph were used instead.

In the experiment, 95% of the data were randomly selected for training purposes, while the remaining 5% were reserved for evaluation.

Data Sets for the Downstream Task

To comprehensively evaluate the performance of the MCST-AFN, experiments were conducted on 13 benchmark data sets from MoleculeNet. MoleculeNet is a widely utilized benchmark collection for molecular property prediction. The selected 13 benchmark data sets encompass a range of molecular property prediction tasks, from quantum mechanics to pharmacology, with the number of molecules varying from less than 1 K to approximately 134 K. Detailed information about the data sets is summarized in Table . In the QM9 data set, labels for homo, lumo, and gap were chosen following previous settings, as these properties have similar scales. For properties related to energy, the training process employed units of electron volts.

1. Statistical Information of Benchmark Datasets Utilized in This Work, Including Data Type, Number of Tasks, Number of Molecules, Task Type, and Evaluation Metrics.
data set data type #tasks #molecules task type metric
BBBP SMILES 1 2039 classification ROC-AUC
BACE SMILES 1 1513 classification ROC-AUC
ClinTox SMILES 2 1478 classification ROC-AUC
Tox21 SMILES 12 7831 classification ROC-AUC
ToxCast SMILES 617 8575 classification ROC-AUC
SIDER SMILES 27 1427 classification ROC-AUC
HIV SMILES 1 41,127 classification ROC-AUC
ESOL SMILES 1 1128 regression RMSE
FreeSolv SMILES 1 642 regression RMSE
Lipop SMILES 1 4200 regression RMSE
QM7 SMILES,3D coordinates 1 7160 regression MAE
QM8 SMILES,3D coordinates 12 21,786 regression MAE
QM9 SMILES,3D coordinates 3 133,885 regression MAE

For benchmark data sets lacking 3D coordinates, RDKit was used for their generation. For classification tasks, the evaluation metric adopted was the Receiver Operating Characteristic Area Under the Curve (ROC-AUC). In regression tasks, the Root Mean Square Error (RMSE) was used as the evaluation metric for the FreeSolv, ESOL, and Lipop data sets, whereas the Mean Absolute Error (MAE) was employed for the QM series data sets. A random splitting strategy was applied across all downstream data sets, with a ratio of 8:1:1 designated for training, validation, testing. Additionally, the types of atoms contained within the 13 benchmark data sets are detailed in Appendix A.

Baseline

The proposed method was compared to other competitive baseline models.

GCN (Graph Convolutional Network), MPNN (Message Passing Neural Network), and D-MPNN (Directed Message Passing Neural Network) are GNN methods that have not undergone pretraining. Among them, MPNN serves as a generalized framework for graph-based models, discarding handcrafted features and enabling the application of GNNs to molecular graphs. D-MPNN, a variant of the generic MPNN architecture inspired by structure2vec adopts a message-passing paradigm based on directed edges rather than nodes.

MolCLR is a molecular contrastive learning pretraining method based on GNN representation, which employs three molecular graph augmentation techniques: atom masking, bond deletion, and subgraph removal, to maximize the consistency between augmented graphs of the same molecule and minimize the consistency between those of different molecules. KCL attempts to use a knowledge-guided graph augmentation module for pretraining, applying weights to the atom nodes in molecular graphs based on the periodic table, thereby incorporating fundamental domain knowledge of elements into the molecular augmented graphs alongside topological structure knowledge. KANO extends upon the KCL approach, presenting a novel strategy that integrates professional knowledge from the chemical field to enhance the performance of molecular property prediction tasks.

3D infomax utilizes the three-dimensional structures of molecules for pretraining, inferring the geometric shapes of molecules given their two-dimensional molecular graphs, thereby equipping the model with three-dimensional structural information. Unimol represents a universal framework for 3D molecular representation learning, diverging from the one-dimensional sequences or two-dimensional molecular graphs commonly used by most models. Instead, it directly leverages the three-dimensional structures of molecules as both inputs and outputs, utilizing the 3D information to train the model. GEM features a specially designed geometry-based graph neural network architecture along with several dedicated geometric-level self-supervised learning strategies to learn molecular geometric knowledge.

For all baseline methods, results reported are either from the reference literature or those provided by the original authors. Where tests for specific data sets were not conducted in the original sources, additional testing was performed, and the results are supplemented in the tables.

Experimental Setup

The N-ProtMD tool encapsulates eight ProtMD encoders with different time span encodings. The number of layers in the convolutional block attention residual network is set to 3. For both the pretraining task of atom masking prediction and the downstream molecular property prediction, a dedicated fully connected layer is configured for output. In the case of the two pretraining tasks, a full-parameter update strategy is employed. As an example, with a sampling interval of five time steps, the hyperparameter settings for training the ProtMD encoder using sampled molecular dynamics trajectory data are detailed in Table , where one time step unit is defined as 50 ps (ps). The hyperparameters for the pretraining task of MCST-AFN on atom masking prediction are presented in Table . During fine-tuning for downstream tasks, the parameters of the N-ProtMD tool, acting as a feature extractor, are frozen, while the parameters of the convolutional block attention residual network and the fully connected layer are trained and updated; the specific hyperparameter settings are provided in Table . Notably, experimental results have demonstrated that when testing the MCST-AFN on the QM9 data set, updating the parameters of the N-ProtMD tool yields better performance, with the activation function of the fully connected layer set to SELU. Conversely, when tested on the other 12 data sets, the activation function of the fully connected layer is set to GELU.

2. Hyperparameter Settings for the Pretraining of the ProtMD5 Encoder in Future Conformation Prediction.

hyperparameter description value
epoch the number of training epochs. 25
batch size the input batch size of training. 512
lr the learning rate of training. 1e-4
optimizer the optimizer type of training. Adam
training_loss the loss function of training. L1Loss
val_loss the loss function of validation. L1Loss
dim the node hidden size for ProtMD. 128
num_tokens the number of atom classes. 100
depth the number of stacked layers for ProtMD. 3
num_nearest_neighbors the number of nearest neighbors for ProtMD. 32
dropout the dropout rate. 0.15
global_linear_attn_every the number of attention layers for EGNN. 1
global_linear_attn_heads the number of heads in the attention layer. 8
num_prompt the vocabulary size of the timespan. 9
prompt the value of the time span. 5
max_len maximum number of nodes for the input graph. 100

3. Hyperparameter Settings for the Pretraining of MCST-AFN in Atom Masking Prediction.

hyperparameter description value
epoch the number of training epochs. 15
batch size the input batch size of training. 32
lr the learning rate of training. 1e-4
optimizer the optimizer type of training. Adam
training_loss the loss function of training. Cross-Entropy
val_loss the loss function of validation. Cross-Entropy
dim the node hidden size for MCST-AFN. 128
vocab_size the output dimension of the output layer 13
num_tokens the number of atom classes. 100
depth the number of stacked layers for ProtMD. 3
num_nearest_neighbors the number of nearest neighbors for ProtMD. 32
dropout the dropout rate. 0.15
global_linear_attn_every the number of attention layers for EGNN. 1
global_linear_attn_heads the number of heads in the attention layer. 8
num_prompt the vocabulary size of the timespan. 9
max_len max number of nodes for the input graph. 100

4. Hyperparameter Settings for the Fine-Tuning of MCST-AFN in Downstream Tasks.

hyperparameter description range
epoch the number of training epochs. [100, 200, 500]
batch size the input batch size of training. [8, 16, 32]
lr the learning rate of training. [1e-4, 5e-4, 1e-3]
optimizer the optimizer type of training. Adam

Details of Self-Supervised Learning Task Experimentation

Pretrained ProtMD

Self-supervised learning tasks for future conformation prediction are conducted on ProtMD. Taking the ProtMD5 model trained with a time span of five time steps as an example, distributed training is performed on two RTX2080Ti GPUs, with hyperparameter settings as detailed in Table . Figure a,b illustrate the changes in the loss function over epochs during the training of ProtMD5 with 25 and 50 molecules, respectively. For Figure b, there is a significant decrease in the loss function value within the first five epochs; however, in subsequent epochs, the loss function fluctuates and struggles to converge further. At epoch 19, both the training and validation losses reach their minimum simultaneously, and the entire process took about 2 h. Similarly, for Figure a, the lowest point of the training and validation losses is achieved at epoch 8. Subsequent experiments opt for models with the lowest loss function values. The N-ProtMD tool requires training N-ProtMD models, and the process of training with 50 molecules takes approximately 2N hours.

5.

5

Line graphs of training loss and validation loss for ProtMD5. (a) Trained with 25 molecules and (b) trained with 50 molecules.

Training a neural network to approximate real molecular dynamics simulations proved to be more challenging than anticipated. For the training with 50 molecules, the originally planned number of epochs was larger, but due to the occurrence of gradient vanishing or explosion beyond 25 epochs, the training was prematurely terminated. The difficulty in converging the loss function may be attributed to structural bottlenecks inherent in ProtMD and the complexity of capturing the intrinsic movement patterns of the molecules. Recent studies have reported new models, such as DIFFMD which offer alternative solutions toward the goal of accelerating molecular dynamics simulations with deep learning. However, the uniqueness of ProtMD lies in its approach of encoding the changes in atomic positions obtained from training directly into the atomic features, a process that is also considered to incorporate spatiotemporal information into the features. Therefore, despite the training outcomes of ProtMD falling short of expectations, it has been chosen as the foundation for a multichannel spatiotemporal feature extractor, the N-ProtMD tool, leaving the application of such improved encoders to future work.

The unit for the components of atomic coordinates is Å (angstroms), and the training and validation loss functions, represented by the Mean Absolute Error (MAE) of the atomic coordinate components, are expressed as

L1=1BS×M×3i=1BSj=1Mk=13|ŷijkyijk| 8

Here, BS denotes the batch size of the data, M represents the number of atoms in a molecule, 3 indicates the dimension of the atomic coordinate components (x,y,z), ŷ ijk represents the predicted atomic coordinates, and y ijk denotes the actual atomic coordinate labels.

Pretrained MCST-AFN

Self-supervised learning tasks for atom masking prediction are conducted on MCST-AFN. In the experiment, two versions of the N-ProtMD tool were used to train two MCST-AFN models separately, with the two versions referring to kinetic trajectory data trained using 25 and 50 molecules, respectively. Distributed training was carried out on two RTX2080Ti GPUs, with hyperparameter settings, as detailed in Table . The loss function used during training was the multiclass cross-entropy loss. Figure a,b show the changes in the training loss and validation accuracy, respectively. Specifically, on the x-axis of Figure a, each step represents the use of 32,000 data samples. From Figure , it can be observed that during training, the loss of MCST-AFN tends to level off by the 20th step, gradually reaching convergence; whereas in the validation phase, by the end of the first epoch, the test accuracy of both versions has exceeded 0.98, indicating that after just one epoch of training the model is already capable of effectively recognizing the types of masked atoms. The model was trained for 15 epochs with a batch size of 32, and the entire process took approximately 41 h. The model instance with the highest validation accuracy is selected for fine-tuning tests in downstream scenarios.

6.

6

(a) Line graph showing the change in the loss function of MCST-AFN with steps during training and (b) line graph depicting the change in validation accuracy of MCST-AFN with epochs during validation.

MCST-AFN Test Results

According to the experimental setup in Table , we conducted tests on 13 benchmark data sets with different learning rates. Since the Adam optimizer was employed, even with a given initial learning rate, it adaptively adjusted the learning rate based on validation performance. For instance, if the validation metric did not improve within 10 epochs, the learning rate was reduced to 0.6 times its previous value. Consequently, the choice of initial learning rate did not lead to significant differences in test results, though optimal initial learning rates varied across data sets to ensure the best performance. However, the initial learning rate was constrained within a specified range; exceeding 1e-3 could cause gradient explosion, while values below 1e-4 would unnecessarily prolong training time. We also implemented an early stopping strategy during training, halting the process if the validation metric failed to improve over 50 consecutive epochs. For downstream fine-tuning tests using MCST-AFN, taking the BBBP data set (containing nearly 2,000 molecules) as an example, training for 100 epochs with a batch size of 32 required approximately 2 h.

Tables and report the overall performance of MCST-AFN and the nine existing methods. The difference between Ours-1 and Ours-2 lies in the number of molecules in the molecular dynamics data used for pretraining the N-ProtMD tool, with the former being 25 and the latter 50 molecules. In the tables, the best performance is highlighted in bold, and the second-best performance is underlined to facilitate clear comparisons. When evaluating multilabel tasks, such as testing on the QM9 data set, only one instantiated model was trained, with the output dimension size of the fully connected layer adjusted to match the number of labels.

5. Comparison of ROC-AUC for Seven Classification Tasks in MoleculeNet .

metric ROC-AUC
Data set BACE BBBP ClinTox Tox21 ToxCast SIDER HIV
#Molecules 1513 2039 1478 7831 8575 1427 41127
#Tasks 1 1 2 12 617 27 1
GCN[2016] 0.854 0.877 0.807 0.772 0.650 0.638 0.740
MPNN[2017] 0.815 0.913 0.815 0.741 0.691 0.621 0.771
D-MPNN[2019] 0.852 0.919 0.852 0.821 0.718 0.632 0.770
MolCLR*[2022] 0.850 0.724 0.880 0.784 0.691 0.597 0.778
KCL*[2022] 0.924 0.956 0.898 0.821 0.714 0.671 0.770
KANO*[2023] 0.925 0.941 0.926 0.819 0.718 0.651 0.765
3D informax*[2022] 0.794 0.691 0.594 0.745 0.644 0.534 0.761
unimol[2023] 0.857 0.729 0.919 0.791 0.696 0.655 0.808
GEM[2022] 0.856 0.724 0.901 0.781 0.692 0.672 0.806
Ours-1 0.881 0.949 0.886 0.833 0.747 0.645 0.794
Ours-2 0.873 0.961 0.905 0.815 0.775 0.642 0.809
a

The asterisk (*) indicates results obtained from our own testing, such as KCL* and KANO*.

6. RMSE Comparison and MAE Comparisons of Three Regression Tasks in MoleculeNet .

metric RMSE MAE
Data set ESOL FreeSolv Lipop QM7 QM8 QM9
#Molecules 1128 642 4200 7160 21786 133885
#Tasks 1 1 1 1 12 3
GCN[2016] 1.431 2.900 0.712 122.9 0.0257 0.00548
MPNN[2017] 1.167 2.185 0.852 124.8 0.0198 0.00522
D-MPNN[2019] 1.050 2.177 0.672 111.4 0.0148 0.00514
MolCLR*[2022] 0.911 2.021 0.875 89.8 0.0179 0.00475
KCL*[2022] 0.670 0.854 0.789 89.8 0.0145 0.00417
KANO*[2023] 0.713 0.521 0.489 55.2 0.0157 0.00387
3D informax*[2022] 0.798 1.855 0.880 103.5 0.0156 0.00425
unimol[2023] 0.788 1.480 0.603 41.8 0.0156 0.00467
GEM[2022] 0.798 1.877 0.660 58.9 0.0171 0.00746
Ours-1 0.551 1.492 0.484 47.4 0.0131 0.00459
Ours-2 0.538 1.216 0.534 41.5 0.0128 0.00414
a

The asterisk (*) indicates results obtained from our own testing, such as KCL* and KANO*.

Conclusions drawn from Tables and include: (1) MCST-AFN was tested across 13 benchmark data sets, achieving an average performance improvement of 2.10% on 12 data sets except for the FreeSolv data set, with the largest improvement of 19.70% observed on the ESOL data set; (2) Pretraining the N-ProtMD tool with molecular dynamics data of different molecule numbers has a certain impact on the prediction performance of MCST-AFN, with a general trend showing a slight performance enhancement as the number of molecules increases; (3) Figure indicates that among the 13 benchmark data sets, 7 data sets contain molecules with atoms outside the 10 basic atom types. In conjunction with Figure , this supports the effectiveness of using a unified representation “[UNK]” for less common atom types.

Ablation Studies

Ablation studies are conducted to verify the importance of each component to the model’s performance. Tables and present the predictive performance of the model after excluding the pretraining step and omitting various components, respectively. The best-performing results in the benchmark tests are highlighted in bold.

7. Comparison of ROC-AUC for Seven Classification Tasks in MoleculeNet Based on Ablation Studies.

metric ROC-AUC
Data set BACE BBBP ClinTox Tox21 ToxCast SIDER HIV
MCST-AFN 0.873 0.961 0.905 0.815 0.775 0.642 0.809
w/o pretraining-1 0.839 0.927 0.864 0.796 0.728 0.638 0.785
w/o pretraining-2 0.701 0.841 0.807 0.758 0.701 0.612 0.694
w/o CBAM 0.779 0.880 0.842 0.774 0.738 0.619 0.725
w/o average 0.865 0.943 0.868 0.807 0.759 0.622 0.793

8. Comparison of RMSE and MAE for Three Regression Tasks in MoleculeNet through Ablation Studies.

metric RMSE MAE
Data set ESOL FreeSolv Lipop QM7 QM8 QM9
MCST-AFN 0.538 1.216 0.534 41.5 0.0128 0.00414
w/o pretraining-1 0.610 1.799 0.628 55.7 0.0146 0.00537
w/o pretraining-2 0.637 2.288 0.838 53.2 0.0178 0.00485
w/o CBAM 0.712 2.623 0.916 52.7 0.0245 0.00706
w/o average 0.549 1.585 0.553 46.6 0.0180 0.00978

The complete MCST-AFN model is obtained by pretraining the encoder with 50 molecular dynamics data sets. “w/o pretraining-1” indicates the removal of the future conformation prediction pretraining task, with the weight parameters of the N-ProtMD tool initialized using a random normal distribution. “w/o pretraining-2” denotes the removal of the atom masking prediction pretraining task, with the CBAM weight parameters initialized using a random normal distribution. “w/o CBAM” signifies the removal of the convolutional block attention from the Convolutional Block Attention Residual Network (CBAM). “w/o average” means that after passing through the CBAM residual network, instead of taking the average across the third channel dimension of the molecular representation, the maximum value is taken.

The experimental results presented in Table and Table demonstrate that removing the pretraining steps and individual components can easily lead to a decline in model performance. Particularly, the absence of the convolutional block attention component results in a significant performance drop, averaging a decrease of 47.9%. This indicates that this attention mechanism effectively integrates multichannel spatiotemporal features and mitigates biases introduced during training. Ablation studies confirm the indispensability of the pretraining step and the two components in MCST-AFN, suggesting that the complete MCST-AFN can learn richer molecular representations, thereby enhancing the predictive performance of downstream tasks.

Case Study

The N-ProtMD tool, in essence, is a deep learning model that functions as a low-fidelity molecular dynamics simulation, trained on real molecular dynamics simulation data, with its core component being the ProtMD module. The uniqueness of the ProtMD encoder lies in its ability to simultaneously transform features and three-dimensional coordinates. Specifically, the atom-level embedding vectors generated by the encoder are highly sensitive to changes in atomic positions due to the incorporation of interatomic distances during the encoding process. When the relative positions of atoms change, their embedding vectors also change accordingly. Therefore, it is necessary to investigate the differences between the new coordinates output by the trained N-ProtMD tool after different time spans and the initial input coordinates. An experimental case was selected using a particular molecule, whose SMILES notation is as follows

NCC(CC(O)=O)C1=CC=C(Cl)C=C1

First, the initial 3D conformation of the molecule was generated using RDKit, and then the movement of the conformations output by the N-ProtMD tool was visualized using the PyMOL software. Eight subgraphs annotated with the Root Mean Square Deviation (RMSD) were plotted with the snapshot of the molecule at the initial moment as the background reference, as shown in Figure . These images reveal that as the observation timeline extends, there is a trend toward molecular compression, which can be interpreted as the contraction and vibration phenomena of internal chemical bonds within the molecule. Unfortunately, within the given time frame, the movement patterns of the molecule are relatively monotonous, failing to exhibit more diversified behaviors such as bond rotation.

7.

7

Composite image of molecular conformation snapshots at various time intervals following the initial moments combined with a snapshot from the initial time point. T denotes the initial time, and Δt represents the size of the time interval. (a) Comparison between the input molecule and the molecule at the first time frame. (b) Comparison between the input molecule and the molecule at the second time frame. (c) Comparison between the input molecule and the molecule at the third time frame. (d) Comparison between the input molecule and the molecule at the fourth time frame. (e) Comparison between the input molecule and the molecule at the fifth time frame. (f) Comparison between the input molecule and the molecule at the sixth time frame. (g) Comparison between the input molecule and the molecule at the seventh time frame. (h) Comparison between the input molecule and the molecule at the eighth time frame.

Further analysis revealed that the molecular dynamics simulation data used for training were sampled at 50 ps intervals over a simulation time of 50 ns, which is sufficient to meet the training requirements. However, compared to protein macromolecules, small molecules exhibit more flexible motion patterns due to the absence of complex secondary structure constraints, their motion patterns are difficult to accurately capture, thus posing significant challenges in constructing highly realistic molecular dynamics simulation deep learning models. Indeed, by visualizing the three-dimensional coordinate sets outputted by the N-ProtMD tool for the case molecule, it was confirmed that N-ProtMD is a low-fidelity model, consistent with our expectations. It reflects temporal progression information to some extent, indicating that the average distance between atoms tends to decrease as the time step increases.

Investigating the impact of this temporal information on atomic embeddings is crucial. The t-SNE technique can achieve data dimensionality reduction, facilitating visualization. By employing t-SNE to visualize the atom-level embeddings outputted by the N-ProtMD tool for the case molecule, as shown in Figure , it is evident that the atom-level embeddings outputted by the ProtMD components, which carry distinct temporal scale information, are clearly distinguished with significant differences in molecular feature clusters at each time point (Figure a). On the other hand, Figure b compares the reduced-dimensionality visualization results of atom-level embeddings for the case molecule and another molecule, demonstrating not only that molecular feature clusters at different time points are clearly distinguished but also that distinctiveness and diversity are maintained among different molecules at the same time point. The SMILES notation for the molecule used in comparison is as follows:

CC1=C2C=CC=NC2=C(O)C(Br)=C1

8.

8

Atomic-level embeddings of molecular objects, outputted by the N-ProtMD tool, are visualized using t-SNE. Here, “1st” denotes the atomic-level embedding features corresponding to one time span unit, “2nd” refers to those corresponding to two time span units, and so forth. Similarly, “self-1st” represents the atomic-level embedding features of the case molecule for one time span unit, while “ref-1st” indicates the atomic-level embedding features of the reference molecule for one time span unit, with this pattern continuing accordingly. (a) Visualization of the atomic-level embedding features of the case molecule. (b) Comparative visualization of the atomic-level embedding features between the case molecule and other molecules.

For the pretraining task of our model, we employed molecular dynamics simulation data to incorporate partial temporal information. The visualization experiments of atom-level embeddings illustrate that the N-ProtMD tool incorporates temporal concepts during the molecular characterization process, providing clear identifiers for the state of molecules at different time points while maintaining heterogeneity among molecules at each time point. This effectively enhances the molecular characterization.

Investigation of Model Interpretability

Multichannel Fusion Strategy vs Single-Channel Strategy

As previously mentioned, we believe that training a single ProtMD encoder would introduce considerable bias, with the encoded spatiotemporal information potentially failing to make a substantial contribution to the target molecular properties. Therefore, we propose a multichannel spatiotemporal feature adaptive fusion network to balance this bias. To verify that the multichannel fusion strategy outperforms the single-channel strategy, an experiment was designed where the core component ProtMD i of the N-ProtMD tool and the MCST-AFN were fine-tuned and tested on the QM7 and BBBP data sets, respectively. To investigate the true performance of these methods, all methods were subjected to self-supervised learning tasks for future conformation prediction without engaging in the self-supervised learning task of atom masking prediction, and full-parameter fine-tuning was adopted. Figure presents the experimental results, from which it can be observed that the multichannel fusion strategy performs better than the single-channel strategy on both data sets. Compared to the best method in the single-channel strategy, the multichannel fusion strategy achieved a 2.21% improvement on the QM7 data set and a 3.44% improvement on the BBBP data set.

9.

9

A bar chart illustrating the test results of multichannel fusion strategies versus single-channel strategies on the QM7 and BBBP data sets. Here, “M” denotes the multichannel fusion method MCST-AFN, “S-p1” indicates the use of a single ProtMD1, “S-p2” signifies the use of a single ProtMD2, and so on.

Single-Conformation Input Strategy vs Multiconformation Input Strategy

In the experimental setup, the N-ProtMD tool accepts the coordinates of a single conformation of the object molecule as input, after which multiple ProtMD i components automatically generate several new sets of coordinates, simultaneously outputting multichannel atomic-level features. This process is described as enhancing molecular features with information on different temporal scales. To confirm that the single-conformation input strategy outperforms the multiconformation input strategy, a comparative experiment was designed. For the single-conformation input strategy, the settings are consistent with those described in the “Multichannel fusion strategy vs. single-channel strategy” section. For the multiconformation input strategy, following the procedures outlined in the “Dataset for the atomic masking prediction task” section, the ETKDG algorithm from RDKit is utilized to obtain simulated three-dimensional coordinates of atoms within a molecule. Different random seeds are set to acquire multiple distinct molecular conformations, and the total potential energy corresponding to each conformation is calculated. Based on the fact that molecules tend to exist in states of minimum free energy, the collection of simulated three-dimensional coordinates of the molecule was ranked according to the total potential energy of their corresponding conformations, from highest to lowest, with the final ranking used as the input for the multiconformation input strategy. In the multiconformation input strategy, the N-ProtMD tool has several experimental versions; apart from the original version, it employs a number of ProtMD i encoders equal to the number of input conformations to generate information-enhanced atomic-level features.

Figure illustrates the experimental results, showing that the single-conformation input strategy outperformed the multiconformation input strategy on both the QM7 and BBBP data sets. Notably, on the QM7 data set, the inconsistency in input conformations introduced high noise, making it difficult to effectively fuse multichannel molecular features, resulting in a slight fluctuation of the evaluation metric MAE values between 139.0 and 143.8 during fine-tuning testing. Compared to the best method in the multiconformation input strategy, the single-conformation input strategy achieved a 61.73% improvement on the QM7 data set and a 7.13% improvement on the BBBP data set.

10.

10

A bar chart depicting the test results of single-conformation input strategies versus multiconformation input strategies on the QM7 and BBBP data sets. Here, “SI” denotes the single-conformation input approach of the MCST-AFN method, “MI” indicates the multiconformation input strategy of the MCST-AFN method, “MI-p1” represents the multiconformation input strategy employing eight identical ProtMD1 models, “MI-p2” signifies the multiconformation input strategy utilizing eight identical ProtMD2 models, and so on.

Visualization of Molecular Representations after Information Enhancement

The t-SNE visualization of the complete MCST-AFN’s learned molecular representations on the QM7 and BBBP data sets is provided. To showcase the best outcomes, molecular features were generated using the model that achieved the optimal evaluation metric after fine-tuning. As illustrated in Figure , the molecular representations are largely clustered according to their property labels; for instance, the molecular representations from the BBBP data set form a clear cluster structure with two distinct clusters. This demonstrates the effectiveness of the design approach that enhances molecular representations through the incorporation of temporal information.

11.

11

Molecular representations learned by MCST-AFN are visualized through t-SNE. Each point is colored according to its property label. (a) Visualization of the QM7 data set with predicted property: atomization energy (unit: eV). (b) Visualization of the BBBP data set with predicted property: blood-brain barrier permeability.

Conclusions

Biological findings indicate that molecular compounds exist in space in a dynamic form, and the study of their physicochemical properties should take this into account, with molecular dynamics simulations typically bearing the responsibility of describing this dynamic process. However, performing direct molecular dynamics simulations on all molecules in a test data set is computationally expensive. Inspired by DLMD and MFL, a multichannel spatiotemporal feature adaptive fusion network framework based on a low-fidelity molecular dynamics model has been proposed, integrating deep learning techniques and molecular dynamics simulations. This framework significantly reduces computational costs while effectively enhancing molecular representation. MCST-AFN was tested across 13 benchmark data sets, achieving an average performance improvement of 2.10% across 12 of these data sets, with the maximum improvement being 19.70% on the ESOL data set. Ablation studies have demonstrated the necessity of each component of MCST-AFN, and case studies have shown that even when the N-ProtMD tool sacrifices a certain level of accuracy in updating coordinates, it still exhibits time-specificity and diversity among molecular species when outputting corresponding atomic-level embedding features. Model interpretability experiments have confirmed the effectiveness of the multichannel fusion and single-conformation input strategies.

Despite the excellent performance of MCST-AFN, the N-ProtMD tool, serving as a feature extractor, still faces challenges in updating coordinates, primarily due to the structural limitations of ProtMD and the complexity of the motion patterns of small molecules. Looking ahead, we plan to optimize deep learning-based molecular dynamics models to enhance their fidelity, thereby constructing more accurate four-dimensional molecular representations and further improving the accuracy of molecular property predictions.

Acknowledgments

The work was supported by the National Natural Science Foundation of China (No. 32470985) and the Foreign Youth Talent Program of the Ministry of Science and Technology, China (No.QN2022014011L). We thank our partners who provided all the help during the research process and the team for their great support.

Appendix

A. Statistics of Atom Types in Downstream Datasets

The basic elements contained in the molecules across all benchmark datasets have been tabulated, as shown in Table .

A1. Statistics of Atom Types in Benchmark Datasets.

data set atom type
BBBP Cl, C, N, O, H, F, S, Br, I, Na, P, Ca, B
BACE O, C, N, H, F, S, I, Cl, Br
ClinTox C, O, N, H, Cl, Tc, P, F, S, Se, B, Fe, Al, Br, I, Ca, Pt, Bi, Au, Tl, Cr, Cu, Mn, Zn, Si, Hg, As, Ti
Tox21 C, O, N, S, H, P, Cl, I, Zn, F, Ca, As, Br, B, K, Si, Cu, Mg, Hg, Cr, Zr, Sn, Na, Ba, Au, Pd, Tl, Fe, Al, Gd, Ag, Mo, V, Nd, Co, Yb, Pb, Sb, In, Li, Ni, Bi, Cd, Ti, Se, Dy, Mn, Sr, Be, Pt, Ge
ToxCast O, N, C, Cl, H, Si, Br, Ba, Nd, Dy, In, P, Sb, Co, S, K, Na, B, Ca, Hg, Ni, Se, Tl, Cd, F, Fe, Li, Yb, I, Cr, Sn, Zn, Cu, Pb, As, Bi, Gd, V, Mn, Au, Ti, Zr, Mo, Mg, Eu, Al, Pt, Sr, Sc, Ag, Pd, Be, Ge
SIDER C, N, H, O, S, Cl, F, Tl, I, Ca, P, Gd, Na, K, Mg, Ge, Br, Fe, Au, Ba, Sr, As, Se, Pt, Co, Li, B, Ra, In, Mn, La, Ag, Zn, Tc, Cf, Ga, Sm, Cr, Cu, Y
HIV C, O, Cu, H, N, S, P, Cl, Zn, B, Br, Co, Mn, As, Al, Ni, Se, Si, V, Zr, Sn, I, F, Li, Sb, Fe, Pd, Hg, Bi, Na, Ca, Ti, Ho, Ge, Pt, Ru, Rh, Cr, Ga, K, Ag, Au, Tb, Ir, Te, Mg, Pb, W, Cs, Mo, Re, U, Gd, Tl, Ac
ESOL O, C, N, H, S, Cl, P, F, I, Br
FreeSolv C, N, O, H, S, Cl, Br, P, F, I
Lipop C, N, Cl, H, O, S, F, B, Br, P, I, Si, Se
QM7 C, H, O, N, S
QM8 C, H, N, O, F
QM9 C, H, N, O, F

B. Molecular Dynamics Simulation Experiments

First, the original topology files of the molecules were downloaded from the DrugBank website, with the IDs of these molecular compounds listed in Table .

A2. Overview of Molecules Selected for Molecular Dynamics Simulations.

No. DA number SMILES
1 DB14201 S(SC1NC2CCCCC2S1)C1NC2CCCC = C2S1
2 DB12015 CC1C(SC(NC(O)N2CCC[C@H]2C(N)O)=N1)C1CC(=NCC1)C(C)(C)C(F)(F)F
3 DB01122 CC[N+](CC)(CCNC(O)C(O)NCC[N+](CC)(CC)CC1 = CC = CC = C1Cl)CC1 = CC = CC = C1Cl
4 DB06742 NC1C(Br)CC(Br)C = C1CN[C@H]1CC[C@H](O)CC1
5 DB06767 [NH4+].[Cl-]
6 DB00613 CCN(CC)CC1C(O)CCC(NC2 = C3C = CC(Cl)=CC3 = NC = C2)=C1
7 DB13853 COC1 = CC = C(C = C1)C1 = CC(=S)SS1
8 DB00972 CN1CCCC(CC1)N1N = C(CC2 = CC = C(Cl)CC2)C2 = CC = CCC2C1 = O
9 DB00181 NCC(CC(O)O)C1 = CC = C(Cl)CC1
10 DB00436 NS(=O)(=O)C1 = CC2 = C(NC(CC3 = CC = CC = C3)NS2(=O)O)CC1C(F)(F)F
11 DB13277 CCC1 = C(C(O)C2 = CC(I)C(O)C(I)=C2)C2 = CC = CC = C2O1
12 DB09225 CN(C)CCOC1 = CC2 = CC = CC = C2SC2 = CC = C(Cl)C = C12
13 DB12026 CC[C@@H]1[C@@H]2CN([C@@H]1C(O)N[C@@]1(C[C@H]1C(F)F)C(O)NS(=O)(=O)C1(C)CC1)C(O)[C@@H](NC(O)O[C@@H]1C[C@H]1CCCCC(F)(F)C1 = C(O2)N = C2CC(OC)C = CC2 = N1)C(C)(C)C
14 DB08828 CS(=O)(=O)C1 = CC(Cl)C(C = C1)C(O)NC1 = CC = C(Cl)C(=C1)C1 = CC = CC = N1
15 DB08881 CCCS(=O)(=O)NC1 = C(F)C(C(O)C2 = CNC3 = NC = C(C = C23)C2 = CC = C(Cl)C = C2)C(F)CC1
16 DB01021 NS(=O)(=O)C1 = C(Cl)C = C2NC(NS(=O)(=O)C2 = C1)C(Cl)Cl
17 DB00831 CN1CCN(CCCN2C3 = CC = CC = C3SC3 = C2C = C(C = C3)C(F)(F)F)CC1
18 DB11678 CS(=O)(=O)OC[C@H](O)[C@@H](O)COS(C)(=O)O
19 DB13222 CC1 = C2C = CC = NC2 = C(O)C(Br)=C1
20 DB12095 CCOC(O)[C@@H](N)CC1CCC(C = C1)C1 = NC(N)=NC(O[C@H](C2 = CC = C(Cl)C = C2N2C = CC(C)=N2)C(F)(F)F)=C1
21 DB01145 OS(=O)CNC1 = CC = C(C = C1)S(=O)(=O)C1 = CC = C(NCS(O)O)C = C1
22 DB00398 CNC(O)C1 = NC = CC(OC2 = CC = C(NC(O)NC3 = CC(=C(Cl)CC3)C(F)(F)F)C = C2)=C1
23 DB01105 CC(C)CC(N(C)C)C1(CCC1)C1 = CC = C(Cl)C = C1
24 DB11182 OC(O)C1 = C(Cl)C(Cl)C(Cl)C(Cl)=C1C1 = C2C = C(I)C(O)C(I)=C2OC2 = C(I)C(O)C(I)C = C12
25 DB01178 CN1C(C2 = CC = C(Cl)C = C2)S(=O)(=O)CCC1 = O
26 DB00880 NS(=O)(=O)C1 = C(Cl)C = C2NC = NS(=O)(=O)C2 = C1
27 DB00369 NC1 = NC(O)N(C[C@@H](CO)OCP(O)(O)O)C = C1
28 DB01013 [H][C@@]12C[C@H](C)[C@](OC(O)CC)(C(O)CCl)[C@@]1(C)C[C@H](O)[C@@]1(F)[C@@]2([H])CCC2 = CC(O)C = C[C@]12C
29 DB01987 CC1 = C(CCO[P@](O)(=O)O[P@](O)([O-])O)SC = [N+]1CC1 = CN = C(C)N = C1N
30 DB11943 NC1 = NC(N2C = C(C(O)O)C(O)C3 = CC(F)C(N4CC(O)C4)C(Cl)=C23)C(F)C = C1F
31 DB00879 [H][C@@]1(CO)O[C@@]([H])(CS1)N1C = C(F)C(=N)N = C1O
32 DB00228 FC(F)OC(F)(F)C(F)Cl
33 DB13228 CN1C = C(C(O)C2 = C1C = C(F)C = C2)S(C)O
34 DB00301 [H][C@]12SC(C)(C)[C@@H](N1C(O)[C@H]2NC(O)C1 = C(C)ON = C1C1 = C(Cl)C = CC = C1F)C(O)O
35 DB00623 OCCN1CCN(CCCN2C3 = CC = CC = C3SC3 = C2C = C(C = C3)C(F)(F)F)CC1
36 DB00317 COC1 = C(OCCCN2CCOCC2)C = C2C(NC3 = CC(Cl)C(F)C = C3)=NC = NC2 = C1
37 DB01016 COC1 = C(C = C(Cl)C = C1)C(O)NCCC1 = CC = C(C = C1)S(=O)(=O)NC(O)NC1CCCCC1
38 DB00793 ClC1 = CC(Cl)C(Cl)C = C1OCC#CI
39 DB01159 [H]C(Cl)(Br)C(F)(F)F
40 DB00753 FC(F)OC(Cl)C(F)(F)F
41 DB00677 CC(C)OP(F)(=O)OC(C)C
42 DB01259 CS(=O)(=O)CCNCC1 = CC = C(O1)C1 = CC2 = C(C = C1)N = CN = C2NC1 = CC(Cl)C(OCC2 = CC(F)=CC = C2)C = C1
43 DB11560 OC(O)CSC1 = NN = C(Br)N1C1 = CC = C(C2CC2)C2 = C1C = CC = C2
44 DB11611 CS(=O)(=O)C1 = CC(C[C@H](NC(O)C2 = C(Cl)C3 = C(CN(CC3)C(O)C3 = CC = C4C = COC4 = C3)C = C2Cl)C(O)O)=CC = C1
45 DB08932 CCCNS(=O)(=O)NC1 = C(C(OCCOC2 = NC = C(Br)C = N2)=NC = N1)C1 = CC = C(Br)C = C1
46 DB14078 OP(O)(=O)CP(O)(O)O
47 DB01028 COC(F)(F)C(Cl)Cl
48 DB14197 CN1SC(Cl)=CC1 = O
49 DB16236 O = C(N1CCN(CC2CC2)CC1)C1 = CC = C(NS(=O)(=O)C2 = CC = CC3 = C2N = CC = C3)C = C1
50 DB00471 OC(O)CC1(CC1)CS[C@H](CCC1 = CC = CC = C1C(O)(C)C)C1 = CC = CC(\C = C\C2 = NC3 = C(C = CC(Cl)=C3)C = C2)=C1

The topology files were preprocessed using the Amber tool, followed by conducting dynamic simulations on the processed molecules. Under the NPT ensemble, running each molecular compound for 50 ns under periodic boundary conditions on one RTX2080Ti GPU takes approximately 0.5 h. The GAFF force field, a general force field for small molecules, was employed. During the minimization, heating, and production simulation processes, a non-bonded interaction cutoff radius of cut = 8.0 Å was used; larger settings increase simulation time. The specific steps are as follows:

  • (1).

    The solvated system was minimized for 5000 steps by specifying maxcyc = 5000, with pmemd using the steepest descent algorithm for the first ncyc = 2500 steps and the conjugate gradient algorithm for the remainder.

  • (2).

    The simulation was gradually heated from 0 to 303.15 K over a period of 9 ns, with the heating system reaching equilibrium in the NPT ensemble over a 1 ns period.

  • (3).

    A 50 ns production simulation was conducted at 303.15 K and under periodic boundary conditions, with structural coordinates collected every 10 ps.

Since the production simulation files contain water solvent, the cpptraj program was used to remove the water molecules, retaining only the selected molecular compounds.

C. ProtMD Architecture

The original ProtMD model was designed to accept inputs from two objects, while the downstream task required it to accept inputs from a single object. Therefore, certain modifications were made to the ProtMD model to adapt it for molecular property prediction tasks, without altering the model’s name. The specific network layers of the modified ProtMD model are formally defined as follows

mji=ϕe(hi(t),l,hj(t),l,xij(t),l) A,1
μji=ajihj(t),l·ϕd(xij(t),l) A,2
xi(t),l+1=xi(t),l+(xi(t),lxj(t),l)ϕx(i,j) A,3
hi(t),l+1=ϕh(hi(t),l,jmji,jμji) A,4

In the equation, l represents the l-th layer of the network (l ∈ [L]), and x represents the atomic coordinates. ϕ e is the operation on the relative distances between atoms. ϕ h is the operation on the nodes, where the graph messages m i =∑ j m ji , the graph messages with attention mechanism μi=∑ j μ ji , and the node embeddings h j are aggregated to obtain updated node embeddings h i . In contrast to the original approach, here we simply set ϕ x m (m ij ) as the weight to sum up all relative distances (x i x j ) and add them to the initial atomic positions to output the updated atomic coordinates x i . When updating the atomic features, an attention mechanism is employed. ϕ d acts on the relative distances x ij between atoms, and a ji represents the attention weights for trainable MLPs ϕ q and ϕ k , which are formulated as follows

aji=exp(ϕq(hi(t),l),ϕk(hj(t),l))jexp(ϕq(hi(t),l),ϕk(hj(t),l)) A,5

The enhanced ProtMD model takes the atomic embedding set {h (t)} and 3D coordinate set {x (t)} as inputs and outputs the next time frame {(x (t+i),h (t+i))}, where i represents the customized time span.

D. Ten-Fold Cross-Validation Results

To demonstrate the robustness of MCST-AFN, additional ten-fold cross-validation results on 13 benchmark datasets from MoleculeNet are provided, as shown in Figure . In the figure, the optimal values of the evaluation metrics for each dataset are marked. The test results on the FreeSolv dataset exhibit considerable fluctuations, attributed to its small sample size. Overall, MCST-AFN shows good robustness.

A1.

A1

10-fold cross-validation results of MCST-AFN on 13 benchmark data sets from MoleculeNet. (a)­10-fold cross-validated ROC-AUC performance across seven classification data sets (BACE, BBBP, ClinTox, Tox21, ToxCast, SIDER, and HIV). (b)­10-fold cross-validated RMSE performance across three regression data sets (ESOL, FreeSolv, and Lipop). (c)­10-fold cross-validation MAE performance on QM7 data set. (d)­10-fold cross-validation MAE performance on QM8 data set. (e) 10-fold cross-validation MAE performance on QM9 data set.

The data and our MCST-AFN model can be downloaded from https://github.com/NTU-MedAI/MCST-AFN.

⊥.

X.C. and W.L. contributed equally to this work and should be considered cofirst authors.

Compliance with Ethics Requirements This article does not contain any studies with human or animal subjects.

The authors declare no competing financial interest.

References

  1. De Cao, N. ; Kipf, T. . MolGAN: An implicit generative model for small molecular graphs, 2018. arXiv:1805.11973. arXiv.org e-Print archive https://arxiv.org/abs/1805.11973.
  2. Li Y., Zhang L., Liu Z.. Multi-objective de novo drug design with conditional graph generative model. J. Cheminf. 2018;10:1–24. doi: 10.1186/s13321-018-0287-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Li R., Wang S., Zhu F., Huang J.. Adaptive graph convolutional neural networks. Proceedings of the AAAI Conference on Artificial Intelligence. 2018;32:1. doi: 10.1609/aaai.v32i1.11691. [DOI] [Google Scholar]
  4. Cang Z., Wei G. W.. TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Computational Biology. 2017;13(7):e1005690. doi: 10.1371/journal.pcbi.1005690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Xu, Z. ; Wang, S. ; Zhu, F. ; Huang, J. . et al. Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics 2017; pp 285–294. [Google Scholar]
  6. Winter R., Montanari F., Noé F., Clevert D. A.. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 2019;10(6):1692–1701. doi: 10.1039/C8SC04175J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Putin E., Asadulaev A., Vanhaelen Q., Ivanenkov Y., Aladinskaya A. V., Aliper A., Zhavoronkov A.. Adversarial threshold neural computer for molecular de novo design. Mol. Pharmaceutics. 2018;15(10):4386–4397. doi: 10.1021/acs.molpharmaceut.7b01137. [DOI] [PubMed] [Google Scholar]
  8. Popova M., Isayev O., Tropsha A.. Deep reinforcement learning for de novo drug design. Science advances. 2018;4(7):eaap7885. doi: 10.1126/sciadv.aap7885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Jiang J., Wang R., Wang M., Gao K., Nguyen D. D., Wei G. W.. Boosting tree-assisted multitask deep learning for small scientific datasets. J. Chem. Inf. Model. 2020;60(3):1235–1244. doi: 10.1021/acs.jcim.9b01184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Weininger D.. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  11. Todeschini, R. ; Consonni, V. . Handbook of Molecular Descriptors; John Wiley & Sons, 2008. [Google Scholar]
  12. Rogers D., Hahn M.. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  13. Gao K., Nguyen D. D., Sresht V., Mathiowetz A. M., Tu M., Wei G. W.. Are 2D fingerprints still valuable for drug discovery? Phys. Chem. Chem. Phys. 2020;22(16):8373–8390. doi: 10.1039/D0CP00305K. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Durant J. L., Leland B. A., Henry D. R., Nourse J. G.. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 2002;42(6):1273–1280. doi: 10.1021/ci010132r. [DOI] [PubMed] [Google Scholar]
  15. Nguyen D. D., Wei G. W.. AGL-score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening. J. Chem. Inf. Model. 2019;59(7):3291–3304. doi: 10.1021/acs.jcim.9b00334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen D., Gao K., Nguyen D. D., Chen X., Jiang Y., Wei G. W., Pan F.. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nat. Commun. 2021;12(1):3521. doi: 10.1038/s41467-021-23720-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hopfinger A. J., Wang S., Tokarski J. S., Jin B., Albuquerque M., Madhav P. J., Duraiswami C.. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 1997;119(43):10509–10524. doi: 10.1021/ja9718937. [DOI] [Google Scholar]
  18. Martins J. P. A., Barbosa E. G., Pasqualoto K. F., Ferreira M. M.. LQTA-QSAR: a new 4D-QSAR methodology. J. Chem. Inf. Model. 2009;49(6):1428–1436. doi: 10.1021/ci900014f. [DOI] [PubMed] [Google Scholar]
  19. Bak A.. Two decades of 4D-QSAR: A dying art or staging a comeback? Int. J. Molecular Sci. 2021;22(10):5212. doi: 10.3390/ijms22105212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Wu F., Li S. Z.. DIFFMD: a geometric diffusion model for molecular dynamics simulations. Proceedings of the AAAI Conference on Artificial Intelligence. 2023;37:5321–5329. doi: 10.1609/aaai.v37i4.25663. [DOI] [Google Scholar]
  21. Grimme S.. Exploration of chemical compound, conformer, and reaction space with meta-dynamics simulations based on tight-binding quantum chemical calculations. J. Chem. Theory Comput. 2019;15(5):2847–2862. doi: 10.1021/acs.jctc.9b00143. [DOI] [PubMed] [Google Scholar]
  22. Axelrod S., Gomez-Bombarelli R.. Molecular machine learning with conformer ensembles. Machine Learning: Sci. Technol. 2023;4(3):035025. doi: 10.1088/2632-2153/acefa7. [DOI] [Google Scholar]
  23. Wu F., Jin S., Jiang Y., Jin X., Tang B., Niu Z., Liu X., Zhang Q., Zeng X., Li S. Z.. Pre-Training of Equivariant Graph Matching Networks with Conformation Flexibility for Drug Binding. Adv. Sci. 2022;9(33):2203796. doi: 10.1002/advs.202203796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Case D. A., Aktulga H. M., Belfon K., Cerutti D. S., Cisneros G. A., Cruzeiro V. W. D., Forouzesh N., Giese T. J., Götz A. W., Gohlke H., Izadi S., Kasavajhala K., Kaymak M. C., King E., Kurtzman T., Lee T.-S., Li P., Liu J., Luchko T., Luo R., Manathunga M., Machado M. R., Nguyen H. M., O’Hearn K. A., Onufriev A. V., Pan F., Pantano S., Qi R., Rahnamoun A., Risheh A., Schott-Verdugo S., Shajan A., Swails J., Wang J., Wei H., Wu X., Wu Y., Zhang S., Zhao S., Zhu Q., Cheatham T. E., III, Roe D. R., Roitberg A., Simmerling C., York D. M., Nagan M. C., Merz K. M. Jr. AmberTools. J. Chem. Inf. Model. 2023;63(20):6183–6191. doi: 10.1021/acs.jcim.3c01153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Case, D. A. ; Aktulga, H. M. ; Belfon, K. ; Ben-Shalom, I. Y. ; Berryman, J. T. ; Brozell, S. R. ; Cerutti, D. S. ; Cheatham, T. E., III ; Cisneros, G. A. ; Cruzeiro, V. W. D. ; Darden, T. A. ; Forouzesh, N. ; Ghazimirsaeed, M. ; Giambaşu, G. ; Giese, T. ; Gilson, M. K. ; Gohlke, H. ; Goetz, A. W. ; Harris, J. ; Huang, Z. ; Izadi, S. ; Izmailov, S. A. ; Kasavajhala, K. ; Kaymak, M. C. ; Kovalenko, A. ; Kurtzman, T. ; Lee, T. S. ; Li, P. ; Li, Z. ; Lin, C. ; Liu, J. ; Luchko, T. ; Luo, R. ; Machado, M. ; Manathunga, M. ; Merz, K. M. ; Miao, Y. ; Mikhailovskii, O. ; Monard, G. ; Nguyen, H. ; O’Hearn, K. A. ; Onufriev, A. ; Pan, F. ; Pantano, S. ; Rahnamoun, A. ; Roe, D. R. ; Roitberg, A. ; Sagui, C. ; Schott-Verdugo, S. ; Shajan, A. ; Shen, J. ; Simmerling, C. L. ; Skrynnikov, N. R. ; Smith, J. ; Swails, J. ; Walker, R. C. ; Wang, J. ; Wang, J. ; Wu, X. ; Wu, Y. ; Xiong, Y. ; Xue, Y. ; York, D. M. ; Zhao, C. ; Zhu, Q. ; Kollman, P. A. . Amber 2024; University of California: San Francisco, 2024. [Google Scholar]
  26. Buterez D., Janet J. P., Kiddle S. J., Oglic D., Lió P.. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat. Commun. 2024;15(1):1517. doi: 10.1038/s41467-024-45566-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Woo, S. ; Park, J. ; Lee, J. Y. ; Kweon, I. S. . Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) 2018; pp 3–19. [Google Scholar]
  28. Zhang X. C., Wu C. K., Yang Z. J., Wu Z. X., Yi J. C., Hsieh C. Y., Hou T. J., Cao D. S.. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Briefings Bioinf. 2021;22(6):bbab152. doi: 10.1093/bib/bbab152. [DOI] [PubMed] [Google Scholar]
  29. Brown, T. B. Language Models are Few-shot Learners, 2020, arXiv:2005.14165. arXiv.org e-Print archive https://arxiv.org/abs/2005.14165.
  30. Satorras, V. G. ; Hoogeboom, E. ; Welling, M. . E­(n) equivariant graph neural networks. In International Conference on Machine Learning; PMLR, 2021; pp 9323–9332. [Google Scholar]
  31. Wishart D. S., Feunang Y. D., Guo A. C., Lo E. J., Marcu A., Grant J. R., Sajed T., Johnson D., Li C., Sayeeda Z.. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D1082. doi: 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Sterling T., Irwin J. J.. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model. 2015;55(11):2324–2337. doi: 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Landrum G.. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum. 2013;8(31.10):5281. [Google Scholar]
  34. Riniker S., Landrum G. A.. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 2015;55(12):2562–2574. doi: 10.1021/acs.jcim.5b00654. [DOI] [PubMed] [Google Scholar]
  35. Halgren T. A.. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996;17(5–6):490–519. doi: 10.1002/(sici)1096-987x(199604)17:5/63.0.co;2-p. [DOI] [Google Scholar]
  36. Wu Z., Ramsundar B., Feinberg E. N., Gomes J., Geniesse C., Pappu A. S., Leswing K., Pande V.. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 2018;9(2):513–530. doi: 10.1039/C7SC02664A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mobley D. L., Guthrie J. P.. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J. Comput.-Aided Mol. Des. 2014;28:711–720. doi: 10.1007/s10822-014-9747-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Delaney J. S.. ESOL: estimating aqueous solubility directly from molecular structure. J. Chem. Inf. Comput. Sci. 2004;44(3):1000–1005. doi: 10.1021/ci034243x. [DOI] [PubMed] [Google Scholar]
  39. Gaulton A., Bellis L. J., Bento A. P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., Overington J. P.. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):D1100–D1107. doi: 10.1093/nar/gkr777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Blum L. C., Reymond J. L.. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009;131(25):8732–8733. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]
  41. Ramakrishnan R., Hartmann M., Tapavicza E., von Lilienfeld O. A.. Electronic spectra from TDDFT and machine learning in chemical space. J. Chem. Phys. 2015;143(8):084111. doi: 10.1063/1.4928757. [DOI] [PubMed] [Google Scholar]
  42. Ruddigkeit L., Van Deursen R., Blum L. C., Reymond J. L.. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012;52(11):2864–2875. doi: 10.1021/ci300415d. [DOI] [PubMed] [Google Scholar]
  43. Kipf, T. N. ; Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.. [Google Scholar]
  44. Gilmer, J. ; Schoenholz, S. S. ; Riley, P. F. ; Vinyals, O. ; Dahl, G. E. . Neural message passing for quantum chemistry. In International conference on machine learning; PMLR, 2017; pp 1263–1272. [Google Scholar]
  45. Yang K., Swanson K., Jin W., Coley C., Eiden P., Gao H., Guzman-Perez A., Hopper T., Kelley B., Mathea M.. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019;59(8):3370–3388. doi: 10.1021/acs.jcim.9b00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Dai, H. ; Dai, B. ; Song, L. . Discriminative embeddings of latent variable models for structured data. In International Conference on Machine Learning; PMLR, 2016; pp 2702–2711. [Google Scholar]
  47. Wang Y., Wang J., Cao Z., Barati Farimani A.. Molecular contrastive learning of representations via graph neural networks. Nat. Machine Intelligence. 2022;4(3):279–287. doi: 10.1038/s42256-022-00447-x. [DOI] [Google Scholar]
  48. Fang Y., Zhang Q., Yang H., Zhuang X., Deng S., Zhang W., Qin M., Chen Z., Fan X., Chen H.. Molecular contrastive learning with chemical element knowledge graph. Proceedings of the AAAI Conference on Artificial Intelligence. 2022;36:3968–3976. doi: 10.1609/aaai.v36i4.20313. [DOI] [Google Scholar]
  49. Fang Y., Zhang Q., Zhang N., Chen Z., Zhuang X., Shao X., Fan X., Chen H.. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence. 2023;5(5):542–553. doi: 10.1038/s42256-023-00654-0. [DOI] [Google Scholar]
  50. Stärk, H. ; Beaini, D. ; Corso, G. ; Tossou, P. ; Dallago, C. ; Günnemann, S. ; Liò, P. . 3d infomax improves gnns for molecular property prediction. In International Conference on Machine Learning; PMLR, 2022; pp 20479–20502. [Google Scholar]
  51. Zhou, G. ; Gao, Z. ; Ding, Q. ; Zheng, H. ; Xu, H. ; Wei, Z. ; Zhang, L. ; Ke, G. . Uni-Mol: A Universal 3D Molecular Representation Learning Framework International Conference on Learning Representations, 2023. [Google Scholar]
  52. Fang X., Liu L., Lei J., He D., Zhang S., Zhou J., Wang F., Wu H., Wang H.. Geometry-enhanced molecular representation learning for property prediction. Nat. Machine Intelligence. 2022;4(2):127–134. doi: 10.1038/s42256-021-00438-4. [DOI] [Google Scholar]
  53. Schrödinger, L. ; DeLano, W. . PyMOL, 2020, Retrieved from http://www.pymol.org/pymol.
  54. Cheng K., Liu C., Su Q., Wang J., Zhang L., Tang Y., Qi Y.. et al. 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance. Proceedings of the AAAI Conference on Artificial Intelligence. 2025;39:93–101. doi: 10.1609/aaai.v39i1.31984. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data and our MCST-AFN model can be downloaded from https://github.com/NTU-MedAI/MCST-AFN.


Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES