Abstract
Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug–target interaction prediction, 2.6% on drug property prediction, 1.2% on drug–drug interaction prediction, and 4.1% on protein–protein interaction prediction. Through qualitative analysis, we reveal KEDD’s promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.
Introduction
Drug discovery aims to design novel therapeutic agents that respond to a certain disease and reduce their potential side effects on patients [1–3]. The understanding of biomolecules, which entails either drugs or proteins, builds the foundation of drug discovery processes [4]. Such molecular expertise usually resides within three different modalities: molecular structures like SMILES strings of molecules and amino acid sequences of proteins [5], structured knowledge from knowledge graphs [6], and unstructured knowledge from biomedical documents [7]. These modalities complement each other, providing a holistic view to guide researchers in pharmaceutical applications.
While artificial intelligence (AI) models that mine intrinsic patterns from molecular structures and protein sequences [8–11] have achieved great success in assisting drug discovery, recent advances of multimodal models have shown the benefits of incorporating structured and unstructured knowledge in numerous downstream tasks, including drug–target interaction prediction (DTI) [12–14], drug–drug interaction prediction (DDI) [15–17], and protein–protein interaction prediction (PPI) [18,19]. However, existing models are mostly restricted to a single task, and none of them attempt to take advantage of both structured and unstructured knowledge. This limits not only the application scope but also the capability of AI systems to holistically understand the intrinsic properties and functions of biomolecules. Besides, multimodal knowledge is occasionally unavailable for newly discovered drugs and proteins due to the extensive cost of manual annotations. This formidable challenge, known as the missing modality problem [20–22], hampers the capability of multimodal deep learning models in assisting real-world drug development.
In this work, we propose KEDD, a unified end-to-end deep learning framework for Knowledge-Empowered Drug Discovery to solve the aforementioned problems. KEDD simultaneously harvests biomedical expertise from molecular structures, structured knowledge from knowledge graphs, and unstructured knowledge from biomedical literature. KEDD could be flexibly applied to a wide range of AI drug discovery tasks. The framework first incorporates independent off-the-shelf representation learning models to extract dense features from each modality. Then, it performs feature fusion by concatenating the multimodal features and calculates the results with a prediction network. To alleviate the missing modality problem for structured knowledge, KEDD leverages multihead sparse attention to reconstruct features based on the most relevant biomolecules, and proposes a modality masking technique to improve the training of sparse attention.
Comprehensive experiments on 13 popular benchmarks demonstrate KEDD’s capability in solving wide downstream tasks in AI drug discovery. KEDD outperforms state-of-the-art models by an average of 5.2% on DTI, 2.6% on drug property prediction (DP), 1.2% on DDI, and 4.1% on PPI. Additionally, qualitative results shed light on KEDD’s joint comprehension of different modalities and its potential in assisting real-world applications.
Our main contributions are summarized as follows:
• We present KEDD, a unified, end-to-end framework incorporating multimodal knowledge of molecular structure, structured knowledge within knowledge graphs, and unstructured knowledge within biomedical documents for drug discovery.
• We propose sparse attention and modality masking to alleviate the missing modality problem for knowledge graphs.
• We demonstrate the state-of-the-art performance of KEDD in wide-ranging AI drug discovery tasks.
Methods
In this section, we start with a brief introduction of preliminaries and denotations, followed by a introduction of the overall architecture of KEDD. Then, we detail two strategies to incorporate structured and unstructured knowledge, including direct searching and reconstruction via sparse attention. Finally, we present the implementation details of KEDD on several downstream benchmarks.
Preliminaries
KEDD focuses on two types of biomolecules involved in drug discovery: drugs and proteins. Each component further consists of information from three modalities, namely, molecular structure, structured knowledge, and unstructured knowledge. Formally:
| (1) |
where d refers to a drug, p refers to a protein, and D, P refers to the drug and protein spaces. The drug structure DS is profiled as a two-dimensional (2D) molecular graph (V, E), where V denotes atoms and E denotes molecular bonds. The protein structure PS is profiled as a sequence [p1, p2, …, pM] of length M, where pi corresponds to an amino acid. The knowledge base is formulated as KB = (E, R), where E is the entity set and R is composed of numerous triplets (h, r, t). h, t ∈ E are the head and tail entity, respectively, and r is the relation type. The structured knowledge DSK ∈ E or PSK ∈ E is formulated as the corresponding entity in the knowledge base. The unstructured knowledge DUK or PUK is formulated as a text sequence [t1, t2, ⋯, tL] of length L.
AI drug discovery tasks aim to uncover the properties of novel drugs and proteins, as well as the interactions between them. They can be formulated as learning mapping functions from the drug, protein, or joint spaces to binary values. Formally:
• DTI predicts whether a given drug binds to a specific protein target. This task sheds light on improving the effectiveness of drugs and reducing their toxicity to the human body [23]. The task is formulated as learning FDTI : D × P → {0, 1}.
• DP predicts the existence of biomolecular properties such as toxicity, permeability, and side effects. The task is formulated as learning FDP : D → {0, 1}.
• DDI predicts whether two drugs interact with each other, which plays an important role in co-administration. The task is formulated as learning FDDI : D × D → {0, 1}.
• PPI aims at predicting different types of interaction relationships between proteins mainly based on their amino acid sequences. The task is beneficial to applications such as identifying the functions and drug abilities of biomolecules [24]. The task is formulated as learning FPPI : P × P → {0, 1}n, where n is the number of relation types.
For DTI, DDI, and PPI, the binary output signifies the presence of a particular category of interaction between the provided drugs or proteins. For DP, the binary output indicates if the molecule holds a specific property. Due to their similar formulations, we endeavor to build a unified end-to-end deep learning framework to solve these tasks with minimal modifications.
KEDD architecture
Figure 1 illustrates the overall KEDD architecture. In the following section, we detail each component of KEDD.
Fig. 1.

The KEDD architecture. (A) The overall feature fusion framework. Inputs for molecules can be a drug, a protein, or empty depending on the downstream task. (B) Network architecture of the drug structure encoder GraphMVP. (C) Network architecture of the protein structure encoder MCNN. (D) Workflow of the structured knowledge encoder ProNE. (E) Network architecture of the unstructured knowledge encoder PubMedBERT.
Drug structure encoder
To encode the molecular graph DS = (V, E), we use GraphMVP [8], a five-layer GIN [25] pretrained on both 2D molecular graphs and 3D molecular genomics. As illustrated in Fig. 1B, GraphMVP first calculates the initial node embedding matrix based on the type and chirality for each atom. Then, each layer of GIN propagates the node features from the previous layer in a message-passing manner. Specifically, at the kth layer, it first calculates the edge embedding matrix based on the bond type and bond direction. Then, the node features are updated as follows:
| (2) |
where j denotes the corresponding edge connecting u and v, and MLP(k) is a trainable network composed of a fully connected layer, a ReLU activation, and another fully connected layer. The structure feature is calculated by mean pooling over the node features of the last layer:
| (3) |
Protein structure encoder
To encode protein structure PS = [p1, p2, ⋯, pm], we use multiscale convolutional neural network (MCNN) [26], a network with three branches of stacked convolutional layers. The MCNN architecture is shown in Fig. 1C. It first incorporates an embedding layer to transform PS into an embedding matrix . Then, it passes to each branch, which composes one, two, and three convolution layers with a kernel size of 3 × 3, followed by ReLU activation after each convolution layer. Finally, it applies max pooling over the sequence, concatenates the outputs from each branch, and feeds the concatenation results into a fully connected layer. Formally, the structural feature of a protein is calculated as follows:
| (4) |
where F1, F2, F3 are three branches of stacked convolution layers followed by ReLU activation, ⊕ denotes concatenation, M(⋅) denotes max pooling, and WP ∈ ℝ384 × 128 is a trainable matrix.
Structured knowledge encoder
To encode the structured knowledge DSK and PSK, we leverage ProNE [27], a fast and efficient network embedding algorithm, which is illustrated in Fig. 1D. ProNE transforms the knowledge graph KB into an embedding matrix through sparse randomized truncated singular value decomposition (tSVD) decomposition and spectral propagation enhancement. The structured knowledge features are obtained as follows:
| (5) |
Unstructured knowledge encoder
To encode the unstructured knowledge DUK and PUK, we adopt PubMedBERT [28], a language model pretrained on biomedical corpus. As illustrated in Fig. 1E, PubMedBERT is composed of 12 Transformer layers, each composing a self-attention module and a feed-forward network. Given the input tokens [t1, t2, ⋯, tL] where t1=[CLS], PubMedBERT transforms it into a series of contextualized embeddings [h1, h2, ⋯, hL], where hi ∈ ℝ768. Features for unstructured knowledge zUK are calculated by feeding the [CLS] embedding into a fully connected layer with dropout:
| (6) |
where WUK ∈ ℝ768×dU K, bU K ∈ ℝdU K are trainable parameters.
Multimodal feature fusion
The feature vectors with respect to each modality for tasks defined in the “Preliminaries” section are detailed as follows.
For DTI:
| (7) |
For DP:
| (8) |
For DDI:
| (9) |
where D1, D2 denote two input drugs.
For PPI:
| (10) |
where P1, P2 denote two input proteins.
WS, WSK, bS, bSK are trainable parameters. Notably, in DTI, DDI, and PPI, the textual descriptions of two biomolecules are concatenated with a [SEP] token before feeding them into PubMedBERT. Such a design enables the language model to better capture the co-occurrence of key information, thus supporting interaction prediction.
Finally, the features from molecular structures, structured knowledge, and unstructured knowledge are concatenated and passed into a multilayer perceptron to generate prediction results. We incorporate cross-entropy loss as the objective function:
| (11) |
where y ∈ {0, 1} is the ground-truth label.
Multimodal knowledge acquisition
The majority of existing datasets for AI drug discovery only provide structural information DS, PS for drugs and proteins. As shown in the “Methods” section, we propose two strategies to obtain the multimodal knowledge DSK, DUK, PSK, PU K, i.e., direct acquisition and sparse attention-based reconstruction.
Direct acquisition from the BMKG dataset
Based on public repositories [29–33], we build BMKG, a dataset containing molecular structure, interacting relationships, and expert-written textual descriptions for 6,917 drugs and 19,992 proteins. In total, BMKG contains 2,223,850 drug–drug links, 47,530 drug–protein links, and 633,696 protein–protein links. Details of our construction process are presented in Supplementary Section A and Fig. S1. The BMKG dataset functions as a dictionary, wherein biomolecular structures serve as keys, while structured and unstructured knowledge constitute values. We can efficiently acquire multimodal knowledge for drugs and proteins by conducting searches within BMKG based on identical SMILES strings or amino acid sequences.
Mitigating missing modality with sparse attention and modality masking
Ideally, each molecule is accompanied by the corresponding structured and unstructured knowledge. However, as elucidated in Table 1, a considerable proportion of molecules, especially those recently discovered, remain unaccounted for in existing databases owing to the substantial expenses associated with manual annotation processes. This formidable missing modality problem significantly compromises the application of multimodal AI drug discovery approaches in real-world scenarios.
Table 1.
A summary of benchmark datasets. The total number of molecules in the dataset is to the right of /, and the number of molecules linked to BMKG is to the left of /.
| Task | Dataset | # Drugs | # Proteins | # Samples |
|---|---|---|---|---|
| DTI | BMKG-DTI | 2,803/2,803 | 2,810/2,810 | 47,391 |
| Yamanishi08 | 488/791 | 944/989 | 10,254 | |
| DP | BBBP | 841/2,039 | - | 2,039 |
| ClinTox | 556/1,478 | - | 1,478 | |
| Tox21 | 2,191/7,831 | - | 7,831 | |
| SIDER | 677/1,427 | - | 1,427 | |
| ToxCast | 1,875/8,575 | - | 8,575 | |
| MUV | 193/93,087 | - | 93,087 | |
| HIV | 297/41,127 | - | 41,127 | |
| BACE | 312/1,513 | - | 1,513 | |
| DDI | Luo’s | 657/721 | - | 494,551 |
| PPI | SHS27k | - | 1,632/1,690 | 7,624 |
| SHS148k | - | 4,943/5,189 | 44,488 |
To mitigate this issue, we leverage sparse attention [34] shown in Fig. 2A to reconstruct the structured knowledge features and by querying the most relevant entities within the knowledge graph based on molecular structure. We project the molecular structure features or to the feature space of structured knowledge with a fully connected layer. We use the projected results or as queries, and the knowledge graph embedding matrix calculated in the “Results” section as keys and values. The structured knowledge features or are reconstructed as follows:
| (12) |
where WQ, WK ∈ ℝdSK × dSK are trainable parameters. Top(A, k) identifies the k largest elements within A and withdraws the remaining elements by assigning a similarity score of −∞. Different from traditional attention-based networks, WV is fixed as an identity matrix. In this way, the sparse attention can be viewed as a trainable interpolation module that dynamically explores and allocates different weights to the most relevant k entities within the knowledge graph.
Fig. 2.

The multimodal knowledge acquisition pipeline. (A) Sparse attention pipeline that takes the structural features as queries to obtain top-k relevant entities within BMKG. (B) We search for identical biomolecular structures in BMKG to obtain multimodal knowledge. If the search fails or the modality masking is triggered, we apply sparse attention to reconstruct the structured knowledge features.
On occasions where the missing modality problem is not too severe, the number of molecules and proteins that require reconstruction could be insufficient to train the sparse attention module. As depicted in Fig. 2B, we propose a modality masking strategy to address this issue. With a probability of P, we mask the structured knowledge inputs DSK and PSK obtained by direct acquisition, and activate the reconstruction process with sparse attention. This strategy creates additional training samples proportional to the original training set for the sparse attention module. Notably, constituting and with the reconstruction features and can be perceived as a form of data augmentation, thereby enhancing the robustness of our framework.
Evaluation
KEDD is applied on four popular downstream tasks with 13 benchmark datasets summarized in Table 1.
• DTI. We adopt two binary classification datasets: Yamanishi08 [35] and BMKG-DTI. Yamanishi08 is collected mainly from the KEGG database [31]. BMKG-DTI is constructed based on BMKG. More details of this dataset are available in Supplementary Section C and Fig. S2. We perform 5-fold cross-validation for the warm-start, cold-drug, and cold-protein settings, and 9-fold cross-validation for the cold-cluster setting, similar to [36]. Under the warm-start setting, drugs and proteins are randomly partitioned. Under the cold-drug, cold-protein, and cold-cluster settings, drugs, proteins, and both in the test set, respectively, are unseen during training. We report the area under the receiver operating characteristic curve (AUROC) and the area under the precision–recall curve (AUPR) as evaluation metrics.
• DP. We select eight representative binary classification datasets from MoleculeNet [37], a widely adopted benchmark for molecular machine learning. We adopt Scaffold split where drugs within the test set are distinct to those in the training set. The train–validation–test ratio is 8:1:1. We report AUROC for this task.
• DDI. We adopt Luo’s dataset [38], randomly split the binary classification dataset with a train–validation–test ratio of 8:1:1, and report AUROC and AUPR.
• PPI. We leverage the revised version of multilabel classification datasets SHS27k and SHS148k [39]. We follow the breadth first search (BFS) and depth first search (DFS) strategy [18] to split the dataset with a train–test ratio of approximately 4:1. We adopt the Micro F1 score as the evaluation metric.
Details of evaluation datasets and splitting protocols are presented in Supplementary Section B.
Implementation details
Across our experiments, we set the number of attention heads within sparse attention as 4, and the number of extracted entities k as 16. The modality masking probability P is set with 0.05 during training and 0 during testing. To avoid information leakage, we remove connections between drugs and proteins in the test set of DDI, DTI, and PPI datasets from BMKG before calculating knowledge graph embeddings. KEDD adopts the Adam optimizer [40] with a weight decay of 10−6 to update model parameters. The KEDD model is trained on a single A100 GPU with 40 GB memory, with a maximum training cost of 1 day. Each experiment is performed three times with different seeds. The hyperparameters for each dataset are adapted by randomized grid search, and their choices are shown in Table S1.
Results
Performance evaluation on downstream tasks
In this section, we present and analyze the results of KEDD and baseline models on four downstream tasks. We demonstrate that structured and unstructured knowledge could provide valuable biomedical insights for drug discovery, and KEDD attains a comprehensive understanding of biomolecules with multimodal data. A detailed introduction of baselines is presented in Supplementary Section E.
Performance evaluation on DTI
We compare KEDD against machine learning models RF [41] and SVM [42], unimodal baselines including DeepDTA [43], GraphDTA [44], and MGraphDTA [26], and the multimodal approach KGE_NFM [13]. The AUROC results are shown in Fig. 3. The complete experimental results are displayed in Tables S2 and S3.
Fig. 3.

Performance comparison for drug–target interaction prediction under warm-start and cold-start settings. (A) AUROC on Yamanishi08 dataset. (B) AUROC on BMKG dataset.
Under the warm-start setting, deep learning models surpass machine learning baselines by a remarkable margin. Besides, models such as GraphDTA and MGraphDTA that incorporate graph neural network (GNN)-based drug encoders significantly outperform models like DeepDTA that incorporate CNN-based drug encoders, which corroborates prior studies [44]. While KGE_NFM incorporates simple molecular fingerprints to model molecular structure, it also yields promising results, because of the incorporation of knowledge graph embeddings. Remarkably, KEDD achieves the best results on both datasets. Compared to the state-of-the-art model MGraphDTA, KEDD achieves a notable gain of 3.4% and 3.5% in AUROC under the warm-start setting on Yamanishi08 and BMKG-DTI (paired t test, all P <1.3 × 10−6).
In comparison with the overoptimistic results under the warm-start setting, the performance of AI models declines significantly under cold-start settings. Under the cold-cluster setting that is the most challenging, deep learning baselines even underperform RF on Yamanishi08 dataset. Compared to structure-based models, multimodal approaches such as KEDD and KGE_NFM mitigate the cold-start problem and achieve superior performance. On Yamanishi08, KEDD achieves state-of-the-art results under the cold-drug and cold-cluster settings (paired t tests, all P < 1.0 × 10−2) and shows minor statistical difference with KGE_NFM (paired t test, P > 5.0 × 10−2) under the cold-protein setting. Notably, on BMKG-DTI where the missing modality problem does not exist, KEDD exhibits profound improvements over existing models with an average performance gain of 8.1%, 7.5%, and 5.2% on cold-drug, cold-protein, and cold-cluster scenarios, respectively (paired t tests, all P < 2.9 × 10−3). It even achieves competitive results with that of the warm-start setting. These results demonstrate the benefits of incorporating structured and unstructured knowledge, especially for molecules that are out of the generalization scope of structure-based models.
Performance evaluation on DP
Comparisons between KEDD and machine learning models including RF and SVM as well as unimodal baselines including MolCLR [9], KV-PLM [10], MoMu [45], MoCL [46], and GraphMVP [8] are presented in Table 2. KEDD achieves significant performance gains across four of eight benchmarks including BBBP, ClinTox, Tox21, and ToxCast (paired t tests, all P < 4.1 × 10−2). These datasets encompass a relatively limited number of training samples, and the integration of multimodal knowledge endows KEDD with a more comprehensive understanding of the constrained data available. On the other two small datasets SIDER and BACE, KEDD yields an improvement of 2.3% and 2.1%, respectively, over the unimodal counterpart GraphMVP (paired t tests, all P < 1.6 × 10−2). However, it shows minor performance gain with RF on SIDER and underperforms the deep learning model SVM on BACE. We attribute this to the Scaffold split, which makes it challenging for deep learning models to grasp transferable characteristics based on a few thousands of training samples. On MUV and HIV, KEDD shows little statistical difference to GraphMVP (paired t tests, all P >3.3 × 10−1). We speculate that molecules within these two datasets are mostly under investigation and distinct from those recorded in BMKG, which results in the deterioration of KEDD to a unimodal paradigm. On average, KEDD yields an improvement of 2.6% in AUROC (paired t test, P < 1.3 × 10−2) over the state-of-the-art model GraphMVP. The promising outcomes validate the efficacy of integrating multimodal knowledge in DP, an aspect that has been disregarded in prior studies.
Table 2.
Performance comparison in AUROC (%) for drug property (DP) prediction on MoleculeNet. The best results are marked in bold, and the second-best results are underlined.
| Model | BBBP | ClinTox | SIDER | Tox21 | ToxCast | MUV | HIV | BACE | Average |
|---|---|---|---|---|---|---|---|---|---|
| RF | 64.9±1.7 | 66.2±1.0 | 65.8±2.0 | 69.7±1.1 | 59.3±1.2 | 69.0±2.5 | 73.3±0.5 | 79.0±0.9 | 68.4 |
| SVM | 64.5±0.0 | 70.6±0.1 | 56.8±0.0 | 65.1±0.1 | 58.2±0.0 | 64.1±0.1 | 69.2±0.0 | 86.5 ±0.0 | 66.9 |
| GIN | 65.4±2.4 | 74.9±0.8 | 61.6±1.2 | 58.0±2.4 | 58.8±5.5 | 71.0±2.5 | 75.3±0.5 | 72.6±4.9 | 67.2 |
| MolCLR | 71.1±1.4 | 61.1±3.6 | 57.7±2.0 | 74.0±1.0 | 61.6±0.6 | 73.2±2.1 | 74.4±1.3 | 76.7±3.0 | 68.7 |
| KV-PLM | 66.9±1.1 | 84.3±1.5 | 55.3±0.9 | 64.7±1.8 | 58.6±0.4 | 60.2±2.9 | 68.8±4.5 | 71.5±2.1 | 66.3 |
| MoMu | 70.5±2.0 | 79.9±4.1 | 60.5±0.9 | 75.6±03 | 63.4±0.5 | 70.5±1.4 | 75.9±0.8 | 76.7±2.1 | 71.6 |
| MoCL | 71.4±1.1 | 81.4±1.0 | 61.9±0.4 | 72.5±1.0 | 62.6±0.5 | 72.3±0.3 | 74.7±0.6 | 79.9±0.4 | 72.1 |
| GraphMVP | 72.4±1.6 | 77.5±4.2 | 63.9±1.2 | 74.4±0.5 | 63.1±0.4 | 75.0 ±1.0 | 77.0±1.0 | 81.2±0.9 | 73.1 |
| KEDD (w/o SK) | 72.7±1.0 | 86.2±2.9 | 61.9±0.8 | 74.9±0.5 | 63.3±0.5 | 72.0±1.1 | 76.2±1.7 | 81.3±2.0 | 73.6 |
| KEDD (w/o UK) | 72.2±1.2 | 72.5±6.4 | 63.9±0.6 | 75.8±0.3 | 62.8±1.2 | 71.2±0.7 | 75.3±0.8 | 82.5±1.2 | 72.0 |
| KEDD (w/o SA) | 72.3±1.1 | 87.2±1.3 | 62.8±1.5 | 75.1±1.0 | 63.9±0.2 | 72.8±0.6 | 76.3±0.9 | 82.4±0.4 | 74.1 |
| KEDD | 73.6 ±1.1 | 88.4 ±0.7 | 66.0 ±1.4 | 76.8 ±0.4 | 64.9 ±0.5 | 74.7±0.4 | 77.3 ±0.3 | 83.5±0.3 | 75.7 |
w/o SK, without structured knowledge; w/o UK, without unstructured knowledge; w/o SA, without sparse attention.
Performance evaluation on DDI
For this task, we adopt machine learning baselines including RF and SVM, unimodal baselines including DeepDTnet [47], DTINet [38], DeepR2cov [48], and MSSL2drug [49], and multimodal baselines including DDIMDL [50] and KGE_NFM [13]. The experimental results are shown in Table 3. Both machine learning baselines and deep learning baselines achieve promising results, indicating that both molecular structures and network topology provide valuable clues for identifying drug–drug interactions. Notably, KEDD achieves state-of-the-art results on Luo’s dataset in AUROC (paired t test, P < 2.1 × 10−13). While the AUPR score of KEDD is on par with MSSL2drug, our model exhibits significantly better stability between different runs. These results highlight the significance of jointly reasoning over molecular structures, knowledge graphs, and biomedical texts in this task.
Table 3.
Performance comparison in AUROC (%) and AUPR (%) for drug–drug interaction prediction (DDI) on Luo’s dataset
| Model | AUROC (%) | AUPR (%) |
|---|---|---|
| RF | 82.1±0.6 | 80.7±1.0 |
| SVM | 79.7±1.1 | 79.3±1.4 |
| DeepDTneta | 92.3±0.8 | 92.1±1.0 |
| DTINeta | 92.9±0.6 | 92.7±0.9 |
| DeepR2cova | 93.1±0.9 | 91.2±1.2 |
| MSSL2druga | 95.1±0.4 | 94.4 ±1.1 |
| KGE_NFMa | 91.6±0.8 | 90.7±1.0 |
| DDIMDLa | 91.3±0.9 | 90.5±1.4 |
| KEDD (w/o SK) | 96.3±0.1 | 91.7±0.2 |
| KEDD (w/o UK) | 97.1±0.1 | 92.9±0.2 |
| KEDD (w/o SA) | 97.4±0.1 | 94.1±0.2 |
| KEDD | 97.5 ±0.1 | 94.4 ±0.2 |
These results are taken from MSSL2drug [49].
Performance evaluation on PPI
In Table 4, we show the results of KEDD on the SHS27k and SHS148k dataset, compared against machine learning baselines including RF and SVM, unimodal baselines including PIPR [39] and ESM-650M [11], as well as multimodal baselines including GNN-PPI [18] and OntoProtein [19]. On SHS27k, KEDD outperforms baselines under the DFS setting by 2.7% to 10.8% (paired t tests, all P < 3.3 × 10−2). Under the BFS setting that is more challenging, KEDD outperforms multimodal baselines that consist of a similar amount of parameters, but shows minor statistical difference with ESM-650M (paired t test, P > 4.2 × 10−1), the scale of which exceeds KEDD by an order of magnitude (650M versus 115M). On SHS148k, KEDD achieves overwhelming advantages, outperforming ESM-650M by 6.2% and 2.1% absolute gains on the BFS and DFS settings, respectively (paired t test, P < 1.8 × 10−2). We speculate that the disparity between the two datasets lies in scale, with the number of proteins within SHS27k being inadequate for training our model from scratch. In comparison, ESM-650M has attained a good grasp of protein sequences by pretraining with billions of proteins, probably including those within the test set of our datasets. While KEDD opts for MCNN due to computational constraints, we expect a better performance by leveraging more powerful protein sequence encoders.
Table 4.
Performance comparison in Micro F1 (%) for protein–protein interaction prediction (PPI) on SHS27k and SHS148k datasets. The best results are marked in bold, and the second-best results are underlined.
| Model | SHS27k | SHS148k | ||
|---|---|---|---|---|
| DFS | BFS | DFS | BFS | |
| RF | 35.6±2.2 | 37.7±1.6 | 43.3±3.4 | 39.0±1.9 |
| SVM | 53.1±5.2 | 43.0±6.0 | 58.6±0.1 | 49.1±5.3 |
| PIPR | 53.0±2.0 | 47.1±2.4 | 56.5±1.2 | 48.3±0.7 |
| GNN-PPI | 55.1±1.1 | 52.4±2.1 | 59.3±0.9 | 44.8±3.1 |
| OntoProtein | 56.8±0.4 | 61.2±1.6 | 60.8±0.8 | 48.0±1.2 |
| ESM-650M | 61.1±1.0 | 62.9±1.2 | 63.2±0.8 | 55.2±0.5 |
| KEDD (w/o SK) | 60.4±1.5 | 55.6±0.6 | 66.8±1.2 | 55.0±1.2 |
| KEDD (w/o UK) | 62.8±2.0 | 61.3±1.0 | 68.2±0.9 | 55.3±0.8 |
| KEDD (w/o SA) | 63.4±1.3 | 62.3±1.2 | 68.9±0.8 | 57.2±0.5 |
| KEDD | 63.8 ±1.5 | 62.7±1.5 | 69.4 ±1.0 | 57.3 ±1.1 |
Ablation studies
Impact of structured and unstructured knowledge
The success of KEDD relies upon the integration of structured and unstructured knowledge, and we explore if these two components contribute equally to each downstream task. We implement two variants of our framework, namely, KEDD (w/o SK) and KEDD (w/o UK), by removing either the structured or unstructured knowledge. The experimental results are presented in Tables S2 and S3 and Tables 2 to 4.
We observe that removing either structured or unstructured knowledge leads to overall performance degradation, indicating that both modalities are indispensable and complementary to each other. Interestingly, structured knowledge plays a more significant role in interaction prediction tasks including DTI, DDI, and PPI. On DP, the impacts of structured and unstructured knowledge vary. For structured knowledge, these results corroborate the proximity hypothesis [51] that if two nodes within the knowledge graph share similar neighbors, they tend to possess analogous properties, connect with the same entity, and share similar embeddings. For unstructured knowledge, we posit that the input texts typically delineate certain aspects of drugs and proteins, which are implicitly connected and occasionally irrelevant to the downstream task. Notably, removing unstructured knowledge leads to a drastic performance decline of 15.9% on Clintox. We posit that the dataset involves predicting the US Food and Drug Administration (FDA) approval state of drugs, which could be described verbatim or inferred from clinical trial outcomes and marketing information within texts.
Impact of sparse attention
To investigate if the proposed sparse attention mitigates the missing modality problem, we implement KEDD (w/o SA), where we use zero vectors instead of reconstructed features for drugs and proteins that are absent from BMKG. We measure the severity of the missing modality problem by the portion of molecules without structured knowledge, and visualize its relationship with the performance gain attained by sparse attention in Fig. 4. We observe that the benefits of sparse attention are proportional to the severity of the missing modality problem, demonstrating its effectiveness.
Fig. 4.

Relationships between performance gain of sparse attention and the ratio of molecules without structured knowledge. Each dot represents the result on a dataset, colored by the corresponding task.
Impact of modality masking
KEDD proposes modality masking to obtain more training samples for sparse attention and improve robustness. We assess the impact of the masking rate P by experimenting on Yamanishi08 dataset under the cold-drug setting. As shown in Table 5, P = 0.05 yields the best results. When modality masking is not applied (P = 0), the performance deteriorates by 2.4% on average, demonstrating the significance of modality masking. Continued elevation of P results in a slight performance decline, suggesting that the reconstructed features may be suboptimal when compared to the original knowledge graph embeddings.
Table 5.
Performance on DTI using Yamanishi08 dataset under the cold-drug setting with different modality masking probability P
| P | AUROC | AUPR |
|---|---|---|
| 0.00 | 78.0±2.6 | 76.4±2.6 |
| 0.05 | 80.4 ±3.3 | 78.7 ±3.8 |
| 0.10 | 80.2±2.5 | 78.5±2.9 |
| 0.20 | 79.1±3.0 | 77.8±3.4 |
A case study on real-world drug discovery
To test the power of KEDD in real-world drug discovery scenarios, we perform a case study on drug repurposing involving angiotensin-converting enzyme 2 (ACE2), a protein that has proven to be an entry receptor of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [52,53]. We exclude samples containing ACE2 from the BMKG-DTI dataset and train KEDD on the modified dataset. Then, we predict the probability for each drug within the dataset to interact with ACE2 and select the top five candidates. The heterogeneous inputs of ACE2 and the selected drugs are presented in Fig. 5A and B. To explore the features of each modality, we visualize the features of molecular structure , structured knowledge , and unstructured knowledge for each drug via t-distributed stochastic neighbor embedding (t-SNE) [54] in Fig. 5C to E. More details are presented in Supplementary Section G.
Fig. 5.

Drug repurposing for ACE2. (A) Details of ACE2. (B) Top five drug candidates, the corresponding molecular structures, and textual descriptions. Expressions related to ACE2 are highlighted. (C) t-SNE visualization for molecular features . (D) t-SNE visualization for structured knowledge features . (E) t-SNE visualization for unstructured knowledge features . Drugs with >0.5 prediction score based on each modality are highlighted, and the top five drug candidates are marked by different colors and indexes.
Among the five drugs KEDD identified, captopril and lisinopril are experimentally validated active compounds, whose binding affinity values are reported on PubChem [55]. Additionally, recent studies from the biomedical domain point out that vitamin C and enalaprilat also exhibit lowering effects on the protein [56–58], and an in silico work suggests that framycetin could be a potential ACE2 inhibitor [59].
As shown in Fig. 5C and D, the molecular structure and structured knowledge features for the five drugs are mapped closely to each other, indicating that these modalities tend to play major roles in identifying candidates. Besides, the inhibitory effects of enalaprilat, captopril, and lisinopril on ACE, a homologous protein of ACE2, are pointed out in their text descriptions, and the unstructured knowledge features are within the clusters with high prediction scores solely based on this modality.
From the results, we observe that KEDD is capable of searching potential drugs for novel targets by comprehensively integrating structured and unstructured knowledge. Therefore, there is a possibility for the framework to assist real-world drug discovery applications.
Discussion
The ability to harness biomedical expertise from diverse multimodal sources holds crucial significance in the realm of biomedical research and drug discovery. KEDD serves as a pioneering work by developing AI models that jointly exploit biomolecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical documents. Remarkably, KEDD can be flexibly applied to wide downstream tasks with minimum modification of model architecture. Besides, we discuss the missing modality problem, a common phenomenon in real-world scenarios where novel drugs and proteins are unrecorded in existing knowledge bases. We present a novel solution by reconstructing feature vectors with sparse attention and modality masking. Through extensive qualitative and quantitative analysis, we validate that both structured knowledge and unstructured knowledge can aid the deficiency of AI models in predicting biomolecular properties and interactions. We also demonstrate the robustness of KEDD when the missing modality problem is pronounced, primarily owing to the proposed sparse attention and modality masking technique. On the drug-repurposing case for ACE2, four of our five prioritized candidates are validated by recent pharmaceutical studies, highlighting the promising potential of our framework in real-world drug discovery.
While KEDD bears promise in accelerating AI drug discovery research, future efforts are expected to address the limitations and further extend the benefits of our framework. First, KEDD predominantly focuses on the acquisition and incorporation of multimodal information, and leverages GraphMVP and MCNN as the drug and protein encoders. More combinations of biomolecular structure modeling approaches, including those that incorporate the 3D geometries of drugs and proteins, could be applied and compared task-by-task to obtain a comprehensive view of different design choices. Second, the application scope of KEDD could be further extended. For example, more biomedical components including diseases, genes, and cellular transcriptomics can also be considered, and more complicated AI drug discovery tasks such as drug–disease interaction prediction [60] and drug response prediction on cell lines [61] can be applied. Finally, the development of interpretable tools is expected to understand how KEDD makes predictions based on molecular structures, knowledge graphs, and biomedical texts. This will also provide more scientific insights for researchers in real-world applications.
Conclusions
In this work, we present KEDD, an end-to-end deep learning framework for unified AI drug discovery with multimodal knowledge. KEDD builds a novel feature fusion network to jointly harvest the advantages of molecular structure, structured knowledge within knowledge graphs, and unstructured knowledge within biomedical documents. To mitigate the missing modality problem, KEDD leverages sparse attention and a modality masking technique to exploit relevant information from existing knowledge graphs. The effectiveness of KEDD is validated by its state-of-the-art performance on a wide spectrum of downstream tasks, including DTI, DP, DDI, and PPI. With qualitative analysis, we show KEDD’s potential in assisting real-world drug discovery applications.
Acknowledgments
We thank Beijing Academy of Artificial Intelligence (BAAI) for their support.
Funding: This research was funded by the National Key R&D Program of China (2022YFF1203002).
Author contributions: J.Z., Y.W., and Z.N. conceived the idea. Y.L. and X.Y.L. developed the methodology and designed the experiments. Y.L., X.Y.L., and K.H. conducted the experiments. K.H. and M.H. curated the data. Y.L. and X.Y.L. wrote the manuscript. Z.N. supervised the study. All authors revised and approved the final version of the manuscript.
Ethical approval: This study does not involve any animal or human participants, nor does it take place in any private or protected areas. No specific permissions are required for corresponding locations.
Competing interests: The authors declare that they have no competing interests.
Data Availability
The Python code and datasets used in KEDD are https://github.com/icycookies/KEDD_temporal available at https://github.com/PharMolix/OpenBioMed.
Supplementary Materials
References
- 1.Drews J. Drug discovery: A historical perspective. Science. 2000;287(5460):1960–1964. [DOI] [PubMed] [Google Scholar]
- 2.Lomenick B, Olsen RW, Huang J. Identification of direct protein targets of small molecules. ACS Chem. Biol. 2011;6(1):34–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, Doig A, Guilliams T, Latimer J, McNamee C, et al. Drug repurposing: Progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58. [DOI] [PubMed] [Google Scholar]
- 4.Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discov Today. 2021;26(1):80–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–36. [Google Scholar]
- 6.Chaudhri VK, Baru C, Chittar N, Dong XL, Genesereth M, Hendler J, Kalyanpur A, Lenat DB, Sequeda J, Vrandečić D, et al. Knowledge graphs: Introduction, history, and perspectives. AI Mag. 2022;43(1):17–29. [Google Scholar]
- 7.Saxena S, Sangani R, Prasad S, Kumar S, Athale M, Awhad R, et al. Large-scale knowledge synthesis and complex information retrieval from biomedical documents. In: 2022 IEEE International Conference on Big Data (Big Data). Osaka, Japan: IEEE; 2022. p. 2364–2369.
- 8.Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J. Pre-training molecular graph representation with 3D geometry. Paper presented at: International Conference on Learning Representations 2022; 2022.
- 9.Wang Y, Wang J, Cao Z, Farimani AB. Molecular contrastive learning of representations via graph neural networks. Nat Mach Intell. 2022;4:279–287. [Google Scholar]
- 10.Zeng Z, Yao Y, Liu Z, Sun M. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nat Commun. 2022;13(1):862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA. 2021;118(15): Article e2016239118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Thafar MA, Olayan RS, Ashoor H, Albaradei S, Bajic VB, Gao X, Gojobori T, Essack M. DTiGEMS+: Drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Chem. 2020;12(1):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug–target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun. 2021;12(1):6775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yu L, Qiu W, Lin W, Cheng X, Xiao X, Dai J. HGDTI: Predicting drug–target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinformatics. 2022;23(1):126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Asada M, Miwa M, Sasaki Y. Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information. Poster presented at: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2018; Melbourne, Australia. p. 680–685.
- 16.Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics. 2017;18(1):18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lin X, Quan Z, Wang ZJ, Ma T, Zeng X. KGNN: Knowledge graph neural network for drug-drug interaction prediction. Paper presented at: IJCAI. vol. 380. International Joint Conferences on Artificial Intelligence Organization; 2020; Montreal, Canada. p. 2739–2745.
- 18.Lv G, Hu Z, Bi Y, Zhang S. Learning unknown from correlations: Graph neural network for inter-novel-protein interaction prediction. Paper presented at: International Joint Conferences on Artificial Intelligence Organization; 2021; Montreal, Canada. p. 3677–3683.
- 19.Zhang N, Bi Z, Liang X, Cheng S, Hong H, Deng S, Lian J, Zhang Q, Chen H. OntoProtein: Protein pretraining with gene ontology embedding. In: International Conference on Learning Representations 2022. 2022.
- 20.Ma M, Ren J, Zhao L, Tulyakov S, Wu C, Peng X. Smil: Multimodal learning with severely missing modality. Paper presented at: Proceedings of the AAAI Conference on Artificial Intelligence. 2021; Vancouver, Canada. p. 2302–2310.
- 21.Ma M, Ren J, Zhao L, Testuggine D, Peng X. Are multimodal transformers robust to missing modality? Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022; New Orleans, LA, USA. p. 18177–18186.
- 22.Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell. 2023;5(4):351–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform. 2019;93:103159. [DOI] [PubMed] [Google Scholar]
- 24.Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996;93(1):13–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks? Paper presented at: International Conference on Learning Representations 2019; 2019; New Orleans, LA, USA.
- 26.Yang Z, Zhong W, Zhao L, Chen CY-C. MGraphDTA: Deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem Sci. 2022;13(3):816–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang J, Dong Y, Wang Y, Tang J, Ding M. ProNE: Fast and scalable network representation learning. Paper presented at: International Joint Conferences on Artificial Intelligence Organization; 2019; Macao, China. p. 4278–4284.
- 28.Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc. 2021;3(1):1–23. [Google Scholar]
- 29.Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31(1):365–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. DrugBank 5.0: A major update to the drugbank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2007;36(Database issue):D480–D484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zheng S, Rao J, Song Y, Zhang J, Xiao X, Fang EF, Yang Y, Niu Z. PharmKG: A dedicated knowledge graph benchmark for biomedical data mining. Brief Bioinform. 2021;22(4): Article bbaa344. [DOI] [PubMed] [Google Scholar]
- 33.Uniprot Consortium. UniProt: A hub for protein information. Nucleic Acids Res. 2015;43(Database issue):D204–D212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zhao G, Lin J, Zhang Z, Ren X, Sun X. Sparse transformer: Concentrated attention through explicit selection. 2019.
- 35.Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–i240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wang J, Wen N, Wang C, Zhao L, Cheng L. ELECTRA-DTA: A new compound-protein binding affinity prediction model based on the contextualized sequence encoding. J Chem. 2022;14(1):14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. MoleculeNet: A benchmark for molecular machine learning. Chem Sci. 2017;9(2):513–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Chen M, Ju CJT, Zhou G, Chen X, Zhang T, Chang KW, Zaniolo C, Wang W. Multifaceted protein–protein interaction prediction based on siamese residual RCNN. Bioinformatics. 2019;35(14):i305–i314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv. 2014. 10.48550/arXiv.1412.6980 [DOI]
- 41.Ho TK. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. New York City: IEEE; 1995. p. 278–282.
- 42.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–297. [Google Scholar]
- 43.Öztürk H, Özgür A, Ozkirimli E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics. 2018;34(17):i821–i829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Nguyen T, Le H, Quinn TP, Nguyen T, Le TD, Venkatesh S. GraphDTA: Predicting drug–target binding affinity with graph neural networks. Bioinformatics. 2021;37(8):1140–1147. [DOI] [PubMed] [Google Scholar]
- 45.Su B, Du D, Yang Z, Zhou Y, Li J, Rao A, Sun H, Lu Z, Wen J-R. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv. 2022. 10.48550/arXiv.2209.05481 [DOI]
- 46.Sun M, Xing J, Wang H, Chen B, Zhou J. MoCL: Data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. Paper presented at: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; 2021; Singapore. p. 3585–3594. [DOI] [PMC free article] [PubMed]
- 47.Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, Fang J, Huang Y, Guo H, Li L, et al. Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci. 2020;11(7):1775–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang X, Xin B, Tan W, Xu Z, Li K, Li F, Zhong W, Peng S. DeepR2cov: Deep representation learning on heterogeneous drug networks to discover anti-inflammatory agents for COVID-19. Brief Bioinform. 2021;22(6): Article bbab226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wang X, Cheng Y, Yang Y, Yu Y, Li F, Peng S. Multitask joint strategies of self-supervised representation learning on biomedical networks for drug discovery. Nat Mach Intell. 2023;5:445–456. [Google Scholar]
- 50.Deng Y, Xu X, Qiu Y, Xia J, Zhang W, Liu S. A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics. 2020;36(15):4316–4322. [DOI] [PubMed] [Google Scholar]
- 51.Qiu J, Chen Q, Dong Y, Zhang J, Yang H, Ding M, Wang K, Tang J. Gcc: Graph contrastive coding for graph neural network pre-training. Paper presented at: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2020; San Diego, CA, USA. p. 1150–1160.
- 52.Zamorano Cuervo N, Grandvaux N. ACE2: Evidence of role as entry receptor for SARS-CoV-2 and implications in comorbidities. eLife. 2020;9: Article e61390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li Y, Zhou W, Yang L, You R. Physiological and pathological regulation of ACE2, the SARS-CoV-2 receptor. Pharmacol Res. 2020;157: Article 104833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(86):2579–2605. [Google Scholar]
- 55.Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–D1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ivanov V, Goc A, Ivanova S, Niedzwiecki A, Rath M. Inhibition of ACE2 expression by ascorbic acid alone and its combinations with other natural compounds. Infect Dis (Auckl). 2021;14: Article 1178633721994605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zuo Y, Zheng Z, Huang Y, He J, Zang L, Ren T, Cao X, Miao Y, et al. Vitamin C is an efficient natural product for prevention of SARS-CoV-2 infection by targeting ACE2 in both cell and in vivo mouse models. bioRxiv. 2022. 10.1101/2022.07.14.499651. [DOI]
- 58.Moraes DS, Farias Lelis D, JMO Andrade, Meyer L, ALS Guimarães, Batista De Paula AM, Farias LC, SHS Santos. Enalapril improves obesity associated liver injury ameliorating systemic metabolic markers by modulating angiotensin converting enzymes ACE/ACE2 expression in high-fat feed mice. Prostaglandins Other Lipid Mediat. 2021;152: Article 106501. [DOI] [PubMed] [Google Scholar]
- 59.Rampogu S, Lee KW. Pharmacophore modelling-based drug repurposing approaches for SARS-CoV-2 therapeutics. Front Chem. 2021;9: Article 636362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Rohani N, Eslahchi C. Drug-drug interaction predicting by neural network using integrated similarity. Sci Rep. 2019;9(1):13645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zhang F, Wang M, Xi J, Yang J, Li A. A novel heterogeneous network-based method for drug response prediction in cancer cell lines. Sci Rep. 2018;8(1):3355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Landrum G. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum. 2013;8:31.
- 63.Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez gene: Gene-centered information at ncbi. Nucleic Acids Res. 2005;33(Database issue):D54–D58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Povey S, Lovering R, Bruford E, Wright M, Lush M, Wain H. The HUGO gene nomenclature committee (HGNC). Hum Genet. 2001;109:678–680. [DOI] [PubMed] [Google Scholar]
- 65.Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, Barabási AL. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347(6224): Article 1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, Boucher L, Leung G, Kolas N, Zhang F, et al. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021;30(1):187–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING v10: Protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447–D452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert DA, Hochreiter S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci. 2018;9(24):5441–5451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Liu H, Sun J, Guan J, Zheng J, Zhou S. Improving compound–protein interaction prediction by building up highly credible negative samples. Bioinformatics. 2015;31(12):i221–i229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Chen L, Tan X, Wang D, Zhong F, Liu X, Yang T, Luo X, Chen K, Jiang H, Zheng M. TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics. 2020;36(16):4406–4414. [DOI] [PubMed] [Google Scholar]
- 71.Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–197. [DOI] [PubMed] [Google Scholar]
- 72.Yang L, Xia J-F, Gui J. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein Pept Lett. 2010;17(9):1085–1090. [DOI] [PubMed] [Google Scholar]
- 73.Yang B, Yih SWt, He X, Gao J, Deng L. Embedding entities and relations for learning and inference in knowledge bases. Paper presented at: Proceedings of the International Conference on Learning Representations (ICLR) 2015; 2015; San Diego, CA, USA.
- 74.He X, Chua TS. Neural factorization machines for sparse predictive analytics. Paper presented at: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval; 2017; Tokyo, Japan. p. 355–364.
- 75.Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. 2018. 10.48550/arXiv.1810.04805. [DOI]
- 76.Natarajan N, Dhillon IS. Inductive matrix completion for predicting gene–disease associations. Bioinformatics. 2014;30(12):i60–i68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Trott O, Olson AJ. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Python code and datasets used in KEDD are https://github.com/icycookies/KEDD_temporal available at https://github.com/PharMolix/OpenBioMed.
