Abstract
Non-coding RNAs (ncRNAs) play crucial roles in drug resistance and sensitivity, making them important biomarkers and therapeutic targets. However, predicting ncRNA-drug associations is challenging due to issues such as dataset imbalance and sparsity, limiting the identification of robust biomarkers. Existing models often fall short in capturing local and global sequence information, limiting the reliability of predictions. This study introduces DMGAT (diffusion map and heterogeneous graph attention network), a novel deep learning model designed to predict ncRNA-drug associations. DMGAT integrates diffusion maps for sequence embedding, graph convolutional networks for feature extraction, and GAT for heterogeneous information fusion. To address dataset imbalance, the model incorporates sensitivity associations and employs a random forest classifier to select reliable negative samples. DMGAT embeds ncRNA sequences and drug SMILES using the word2vec technique, capturing local and global sequence information. The model constructs a heterogeneous network by combining sequence similarity and Gaussian Interaction Profile kernel similarity, providing a comprehensive representation of ncRNA-drug interactions. Evaluated through five-fold cross-validation on a curated dataset from NoncoRNA and ncDR, DMGAT outperforms seven state-of-the-art methods, achieving the highest area under the receiver operating characteristic curve (0.8964), area under the precision-recall curve (0.8984), recall (0.9576), and F1-score (0.8285). The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.
Keywords: ncRNA-drug resistance association prediction, diffusion map, graph convolution network, graph attention network
Introduction
Non-coding RNAs (ncRNAs) play important roles in cells’ development, differentiation, and apoptosis processing. Numerous studies have shown that ncRNAs are extensively involved in human pathological pathways. As biomarkers, they offer new targets for treating diseases such as cancer [1–3]. Non-coding RNAs, such as microRNAs (miRNAs), circular RNAs (circRNAs), long non-coding RNAs (lncRNAs), and Piwi-interacting RNAs (piRNAs), have garnered significant interest from researchers. Recent studies suggest that ncRNAs are involved in various aspects of tumor cell drug resistance, including epithelial–mesenchymal transition, DNA repair, drug efflux and metabolism, and cell cycle progression, highlighting the potential significance of ncRNAs in drug therapy [4, 5]. Different ncRNAs have different functions according to their length and structure. miRNAs are short regulatory biomolecules involved in post-transcriptional regulation of gene expression. Compared to linear miRNAs, circRNAs [6] are more stable and may function as carriers or scaffolds [7]. They regulate protein function by acting as microRNA or protein inhibitors, or they can be translated to perform important biological functions [8, 9]. lncRNAs can play a role in regulating synergistic proteins. In contrast, piRNAs have been studied relatively less. Extending across 24–35 nucleotides [10–12], piRNAs interact with proteins of the Piwi subfamily and are essential for the suppression of transposable elements, protection of the genome, and histone modification, among other roles, influencing gene expression and primarily suppressing transposon activity. There are synergistic interactions between different RNAs. For example, lncRNAs can act as molecular sponges for miRNAs, regulating the expression of their target genes.
The increasing volume of studies connecting abnormal ncRNA expression to various drugs highlights the potential of these ncRNAs as valuable diagnostic markers and therapeutic targets. More specifically, the antagonistic interactions between certain ncRNAs and specific types of drugs can affect protein expression, which in turn influences human metabolism. For example, miR-181a interacts with drugs by modulating molecular pathways involved in glioblastoma drug resistance, potentially altering the tumor’s responsiveness to treatments [13]. One study shows miR-181a influences drug efficacy by regulating key molecular mechanisms related to glioblastoma drug resistance, potentially affecting treatment outcomes [14]. miR-181a modulates drug resistance in glioblastoma by influencing molecular pathways that control tumor sensitivity to chemotherapy, potentially impacting treatment success [15].
With the decreasing cost and advancements in sequencing technology, many databases now offer a wealth of resistance and sensitivity associations between ncRNAs and drugs, including experimentally validated and predicted associations. ncDR [16] database has collected 5864 experimental verified resistance relationships between 1039 ncRNAs (162 lncRNAs and 877 miRNAs) and 145 compounds from around 900 published studies. It used in a lot of ncRNAs association prediction studies such as LRGCPND [17], RDRGSE [18] and GSLRDA [19]. NoncoRNA [20] provide an all-encompassing database for searching drug resistance/sensitivity-related ncRNAs in different human cancers. ncRNADrug [21] is a comprehensive and integrated database that collected manually curated and predicted resistance/sensitivity associations between ncRNAs and drugs. These databases provide a wealth of knowledge for understanding the associations between ncRNAs and drugs in humans.
To date, numerous ncRNA-drug associations have been validated through experimental biological methods. However, progress in this field has been constrained by the significant time and resource investments required for such validation [22, 23]. In response to these limitations, researchers have developed various computational algorithms to efficiently explore and map ncRNA-drug networks. For example, the NTSHMDA [24] model constructs a heterogeneous network integrating disease and microbe similarities with known associations, using a weighted random walk algorithm based on network topology. In KATZHMDA [25], a heterogeneous network combines multisource miRNA and disease similarity networks with known associations, predicting links via the KATZ algorithm. SDLDA [26] applies singular value decomposition and deep learning to predict lncRNA-disease associations. AE-RF [27] combines deep autoencoder and random forest for circRNA-disease predictions. ABHMDA [28] uses k-means clustering to select reliable negative samples and an adaptive boosting classifier for microbe-disease predictions. DMFCDA [29] employs deep matrix decomposition with a fully connected projection layer to extract latent circRNA-disease features, feeding them into a neural network. In DMFMDA [30], one-hot encoded microbe and disease data is transformed to low-dimensional vectors through an embedding layer, with predictions made via matrix decomposition in a neural network.
Although the mentioned methods have demonstrated good performance, there is still a room to enhance the processing of mining the feature of ncRNAs and drugs using more comprehensive information. On one hand, current methods like NTSHMDA and DMFMDA, do not take into account the sequence information of RNA and drugs, relying solely on association matrices to derive features. This approach loses a lot of valuable information contained within the sequences. On the other hand, existing models focus only on resistance while treating all other associations as unknown. This means that all unknown associations are excluded from the parameter iteration process, overlooking the presence of sensitivity associations among them.
To overcome these limitations, we introduced the diffusion map and heteregeneous graph attention network (DMGAT) model, which incorporates an attention-based graph convolutional network along with sensitivity associations in diffused space using diffusion map. Diffusion map can make ncRNA or drug features more continuous in the manifold space, while GAT graph neural network utilizes attention mechanisms to fully capture the association features between ncRNA and drugs. The key contributions of our work are outlined as follows:
We introduce DMGAT, a novel deep learning model that integrates diffusion maps with graph convolutional and attention networks for accurate prediction of ncRNA-drug associations.
DMGAT employs the word2vec technique to embed ncRNA sequences and drug SMILES while constructing a heterogeneous network that combines sequence and Gaussian interaction profile similarities.
Our approach tackles data imbalance and sparsity by incorporating sensitivity associations and using a random forest classifier to select reliable negative samples.
Our source code and data are available on github (https://github.com/liutingyu0616/DMGAT/tree/main).
Materials
Dataset
The manually curated ncRNA-drug associations datasets NoncoRNA [20], ncDR [16] and ncRNADrug [21] are collected as benchmark dataset used in our work. We used the resistance associations in NoncoRNA and ncDR dataset, the sensitivity associations are from ncRNADrug dataset.
-
NoncoRNA
NoncoRNA: containing 8233 ncRNA-drug resistance associations between 5568 ncRNAs and 154 drugs in 134 cancers [20]. The 2020 February version is used in our work, which is released at http://www.ncdtcdb.cn:8080/NoncoRNA.
-
ncDR
ncDR is an aggregated dataset that contains manually curated verified and predicted ncRNA-drug associations. We used 2016 June version of ncDR dataset, which involves 145 drugs and 1039 ncRNAs (877 miRNAs and 162 lncRNAs) from around 900 published literatures [16]. The dataset is public releases at https://www.mdpi.com/1422-0067/22/19/10508.
-
ncRNADrug
It comprises ncRNAs linked to drug resistance, along with ncRNAs that are drug targets, experimentally confirmed and computationally predicted. Regarding experimentally validated entries, ncRNADrug includes 29 551 resistance records between 9195 ncRNAs (2248 miRNAs, 4145 lncRNAs, and 2802 circRNAs) and 266 drugs [21]. Additionally, it contains 32 969 records involving 10 480 ncRNAs (4338 miRNAs, 6087 lncRNAs, and 55 circRNAs) that are targeted by 965 drugs.
We only selected experiment verified associations from those three datasets. We refined the dataset by eliminating redundant and ambiguous associations and ones where a lncRNA or miRNA is linked to only a single drug resistance binding from NoncoRNA and ncDR. Additionally, we choose the sensitivity associations and removing the redundant associations from ncRNADrug dataset. After data preprocessing, we got 2693 resistance associations and 408 sensitivity associations between 622 ncRNAs (41 lncRNAs and 581 miRNAs) and 121 drugs. The dataset can be denoted as,
![]() |
(1) |
where
represents the resistance associations, which contains 2689 ncRNA-drug resistance entries,
represents the sensitivity associations, which contains 408 ncRNA-drug sensitivity associations.
represents the associations which not be verified experimentally, the number of
associations is 72576. Known associations account for 3.7% of the total associations.
Methodology
We proposed a ncRNA-drug association predictor DMGAT based on graph attention network and graph attention network in diffusion map space. Random forest classifier and sensitivity associations were introduced to select reliable negative association while training to tackle imbalance dataset problem. the workflow of DMGAT is shown in Fig. 1. There are three mainly steps, (i) embedding ncRNA sequence and drug SMILES using word2vec technique, using diffusion map to reduce the dimension, computing the similarity in diffusion map space. (ii) Extracting the ncRNA and drug feature using graph convolution network, respectively. (iii) Predicting association using graph attention network.
Figure 1.
The flowchart of DMGAT. (a) ncRNA and drug node embedding and similarity calculation. Split sequence of ncRNA and SMILES of drug into several 3-mers, and then replaced by their corresponding feature vectors derived from word2vec. Flatten the feature to 2D using a deep learning network. Using diffusion map to reduce the dimension of feature. Sequence similarity obtained by calculating the Euclid distance of their feature. (b) Feature extraction using GCN. Contracting two separate homogeneous graph convolution networks to extracting ncRNA and drug feature respectively based on their diffusion map feature and similarity. (c) Predicting resistance association score using graph attention network. Combining feature of ncRNA and drug based on attention mechanism, multiply two feature matrices to get the predicted adjacency matrix.
ncRNA sequence and drug SMILES embedding
In most previous studies, the combination of the row and column of adjacent matrix between ncRNAs and drugs was commonly regarded as the feature of each association. However, the adjacent matrix is sparse, which could not describe the difference of each ncRNA-drug resistance pair accurately. Meanwhile, it will lose the information implied in the sequence of ncRNAs and SMILES (simplified molecular input line entry system) of drugs [31]. SMILES is a notation used to represent the structure of drugs. It encodes the molecular structure of a compound by using specific characters to denote atoms, bonds, and connectivity.
Word2Vec encoding
To avoid the sparse problems of feature matrix, we introduced a widely used long sequence embedding technique in natural language model processing field, which is word2vec [32]. The development of word2vec technique is applied in the genism library [33], which utilizes continuous bags of words (CBOW) by default. We split each sequence into 3-mers, ncRNA sequences and drug SMILES, and used the feature of these 3-mers to encapsulate the whole sequence’s feature. This method not only can represent the information implied in local sequence, but also could consider the positional relationship between each local 3-mers, which could reflect the patterns inherent to the sequence and keep the order information.
The sequences information of lncRNAs are obtained from LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook/home) [34]. The sequences of miRNAs are obtained from miRbase (http://mirbase.org/) [35]. The SMILES of drugs are obtained from DrugBank (https://go.drugbank.com/), which is a “gold standard” knowledge base for drug-related information, including drug-target interactions and other pharmaceutical data [36].
After word2vec encoding, each feature of ncRNA and drug is a 2D array, so the feature of each type of ncRNA is a combination of many 2D arrays, that is a 3D array. For subsequent data processing, it is necessary to further encode the 3D array into a 2D array. Here, a fully connected neural network layer is used to reduce the features from 3D to 2D. The optimization direction of the loss function is to make the variance of the compressed 2D array as large as possible. This setting aims to make the features after linear compression more dispersed in the feature space and reflect differences between different ncRNAs and drugs.
Diffusion map
Given that RNA mutates only a few bases at a time, the characteristic of RNA may be continuously changing along a certain path. For a drug, it is assumed that drugs with similar SMILES have similar functional properties, so diffusion mapping method is used to compress the characteristics of ncRNAs and drugs. Diffusion maps [37] are a powerful tool for encoding long sequences by capturing the intrinsic geometric structure of high-dimensional data. This technique constructs a diffusion process based on the pairwise similarities between sequences, which allows for the preservation of both local and global relationships in the sequence of ncRNAs and drugs [38]. Unlike linear dimensionality reduction methods like PCA, diffusion maps excel at identifying non-linear patterns and having the ability to capture continuous changing features, which are critical in biological datasets where the relationships between features are often complex. The use of diffusion maps for sequence encoding ensures robustness to noise, preserves the biological manifold structure, and provides an effective means for dimensionality reduction, facilitating further downstream analysis [39]. The number of eigenvectors to compute in the diffusion map for lncRNA, miRNA, and drug are 9, 150, and 40, respectively, which are a quarter of their feature dimension approximately. Scale parameter epsilon detected by an algorithm by Berry, Harlim, and Giannakis. Normalization parameter alpha is set to 1, which ensures probability conservation, using ball tree algorithm for nearest neighbor search.
A full connected layer was used to extract the feature from diffusion map space in order to unify the dimension of each kind of ncRNA and drug.
ncRNA-drug adjacency network construction
The adjacency matrix shows if there is a resistance association between each ncRNA and drug. It can be denoted as,
![]() |
(2) |
where
is the number of ncRNAs,
is the number of drugs.
if there is a resistance association between the
th ncRNA and the
th drug, and
otherwise.
ncRNA similarity matrix
The similarity between ncRNAs is obtained from the sequence of each ncRNA and the adjacency matrix, which can be denoted as,
![]() |
(3) |
The sequence similarity is obtained by computing the Euclid distance between each ncRNA in diffusion map space, which can be represented as,
![]() |
(4) |
where
,
are the
th and
th ncRNA, respectively.
The Gaussian interaction profile (GIP) kernel similarity has been widely utilized to assess the similarity between two nodes in the prediction of ncRNA-drug associations [38, 40]. This implies that ncRNAs with similar profiles tend to exhibit similar interaction patterns in drugs, and the reverse is also true. It captures the similarity between these entities by creating a Gaussian distribution of interaction profiles based on available interaction data. The GIP kernel similarity between
th and
th ncRNA is as follows [37],
![]() |
(5) |
where
and
are the
th and
th row vector of the adjacency matrix
,
is the kernel width coefficient, which is defined as,
![]() |
(6) |
where
is the total number of lncRNAs and miRNAs,
is the
th row vector of the adjacency matrix
.
The combined integrated similarity between ncRNAs is calculated by taking the average of the sequence similarity and GIP kernel similarity,
![]() |
(7) |
Drug similarity matrix
The similarity between drugs is obtained from the SMILES of each drug and the adjacency matrix. The way of calculating the SMILES similarity between drugs is similar to that of ncRNAs, which can be denoted as,
![]() |
(8) |
where
,
are the
th and
th drug, respectively.
The GIP kernel similarity between
th and
th drug is as follows:
![]() |
(9) |
where
and
are the
th and
th column vector of the adjacency matrix
,
is the kernel width coefficient, which is defined as,
![]() |
(10) |
where
is the total number of drugs,
is the
th column vector of the adjacency matrix
.
The combined integrated similarity between ncRNAs is calculated by taking the average of the sequence similarity and GIP kernel similarity,
![]() |
(11) |
Feature extraction using GCN
GCN has been widely used in different nodes feature aggregation due to its ability to capture hidden graph relations and propagate information through the network [19, 41–45], which is a type of neural network capable of efficiently extracting features from networks by aggregating information from neighboring nodes. ncRNA and drug can be regarded as node in their own graph network, their similarity matrix is the feature transition matrix while propagating feature processing [46]. The GCN takes the ncRNA, drug feature and similarity matrix as input and performs graph convolution operations to fuse their hidden features.
In GCN, the output of the lth layer is treated as the input for the
th layer to capture higher dimensional features. The node embedding at the
th layer is given by:
![]() |
(12) |
where
is the integrated similarity matrix obtained by Equation 7 and 11 for ncRNA and drug respectively,
is the degree matrix of
,
is the ncRNA embedding of the
th layer, which is the concatenated feature by two kinds of ncRNAs. For the drug graph convolution network,
is the drug feature obtained from diffusion map space.
is the trainable parameter matrix.
is the activation function ReLU. So that the feature of ncRNAs and drugs propagated by GCN are obtained, respectively. The number of GCN layer for ncRNA and drug is 2.
Feature extraction using GAT
The above homogeneous GCN only considers the similarity relationship between ncRNA and drug themselves, and the relationship between ncRNA and drug is integrated using GAT [47, 48]. GAT is a neural network designed to operate on graph-structured data by applying attention mechanisms to learn the importance of neighboring nodes [49]. In DMGAT, the GAT layer was used to aggregate the feature of ncRNAs and drugs based on the adjacency matrix to capture the high dimensional features of ncRNAs and drugs.
For the network GAT, the input is the ncRNA and drug feature obtained from GCN and the adjacency matrix between them. The output
th layer was as follows,
![]() |
(13) |
where
is the non-linear activation function ReLU.
represents the normalized attention coefficient between node
and its neighbor node
. It essentially measures the importance or relevance of node
to node
when aggregating information from its neighbors, which is defined as,
![]() |
(14) |
where
is the set of neighbors of node
,
is a learnable weight vector for computing attention, LR is the activation function LeakyReLU,
is a weight matrix applied to the node features. RNA and drug obtain new integrated characteristics after GAT fusion. The number of GAT layer for ncRNA and drug is 4.
Selecting reliable negative associations
Due to reasons such as technology or cost of sequencing, there are a lot of associations between ncRNAs and drugs unfound yet. In many studies about ncRNA-drug associations prediction tasks using graph network, unlabeled associations are treated as negative samples while training [17–19]. Considering all unlabeled examples as negative instances may lead to bias in the learning process, as these samples do not accurately reflect the true negative class. This approach can distort the dataset’s distribution and result in poor performance of the trained model. As well as introduced the imbalanced problem. So, we introduced the sensitivity associations between ncRNAs and drugs as the negative samples. Because the negative samples must be associations that are not resistant, selecting sensitivity associations that have been experimentally verified as negative samples is the most reliable method for determining negative samples. Biological experiments ensure that these associations are definitely not resistant. However, the number of sensitivity associations is significantly less than the number of positive associations, approximately one-fifth of the latter. Therefore, we use a random forest classifier with resistance associations as positive samples and sensitivity associations as negative samples. We train a random forest classifier and select the sample with the lowest score as reliable negative sample to ensure that the known sensitivity associations and predicted sensitivity associations are equal in quantity to positive samples, thus solving the issue of imbalance.
Results
Performance evaluation
To evaluate the performance of DMGAT systematically and objectively, the five-fold cross validation was utilized while training. The experimental confirmed ncRNA-drug resistance association set is denoted as
. The unlabelled association set is denoted as
. The union of verified ncRNA-drug sensitivity association set and predicted reliable sensitivity association set is denoted as
. They can be divided into five subsets with the same size as follows,
![]() |
(15) |
where
.
The training set and test set can be denoted as follows:
![]() |
(16) |
where
,
is the notion of complement operation. It should be noted that the corresponding set for each fold is different, the GIP kernel similarity matrix
and
needs to be recalculated based on the current training set.
The area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPR), accuracy, precision, recall, and the F1-score are commonly employed to assess the performance of prediction models when addressing class imbalance problems.
Comparison with state-of-the-art methods
We compared the performance of DMGAT with seven state-of-the-art models on the same dataset, including NTSHMDA [24], KATZMDA [25], SDLDA [26], AE-RF [27], ABHMDA [28], DMFCDA [29], and DMFMDA [30].
The performance of these models is shown in Table 1. The DMGAT model demonstrates significant advantages in ncRNA-drug association prediction compared to other methods. It achieves the highest AUC (0.8964), AUPR (0.8984), recall (0.9576), and F1-score (0.8285), indicating superior predictive performance. These results suggest that DMGAT excels in overall prediction accuracy and its capacity to handle imbalanced data.
Table 1.
Performance comparison among different methods
| Methods | AUC | AUPR | Recall | F1-score |
|---|---|---|---|---|
| NTSHMDA | 0.7142 | 0.6391 | 0.4470 | 0.5289 |
| KATZMDA | 0.7544 | 0.8048 | 0.6223 | 0.6964 |
| SDLDA | 0.8258 | 0.8663 | 0.6978 | 0.7588 |
| AE-RF | 0.8390 | 0.8535 | 0.6127 | 0.6881 |
| ABHMDA | 0.8428 | 0.8516 | 0.8413 | 0.7756 |
| DMFCDA | 0.8449 | 0.8649 | 0.7463 | 0.7499 |
| DMFMDA | 0.8491 | 0.8546 | 0.8288 | 0.8269 |
| DMGAT | 0.8964 | 0.8984 | 0.9576 | 0.8285 |
Numbers in bold indicate the best performance.
Compared to existing models, our model uses sequence information from ncRNA and drugs to extract features, while most other models, such as NTSHMDA, DMFMDA, and SDLDA, use one-hot encoding, ignoring important sequence information. Additionally, sensitivity association helps select more reliable negative samples, effectively addressing the sample imbalance problem. Moreover, the diffusion map dimensionality reduction method enhances feature continuity in the manifold space, which benefits downstream learning. By leveraging these approaches, the GAT graph neural network can better utilize attention mechanisms to fully capture the association features between RNA and drugs, ultimately improving performance.
We also compared the prediction ability of lncRNA and miRNA associated with drug resistance separately with other models. Table 2 shows the prediction ability of DMGAT compared to several other state-of-the-art models for lncRNA and drug association. The data of the performance of other lncRNA-drug prediction models is from [50]. Table 3 shows the prediction ability for miRNA and drug association. The data of the performance of other miRNA-drug prediction models are from [51].
Table 2.
Performance of lncRNA-drug prediction comparison among different methods
| Methods | AUC | AUPR |
|---|---|---|
| NetLapRLS | 0.811 | 0.547 |
| BLM-NI | 0.784 | 0.560 |
| KBMF2K | 0.841 | 0.521 |
| CMF | 0.834 | 0.538 |
| DMGAT | 0.8397 | 0.8431 |
Numbers in bold indicate the best performance.
Table 3.
Performance of miRNA-drug prediction comparison among different methods
| Methods | AUC | AUPR | Recall | F1-score |
|---|---|---|---|---|
| CF | 0.8618 | 0.2046 | 0.3314 | 0.2873 |
| LP | 0.8610 | 0.2262 | 0.3176 | 0.3075 |
| GF | 0.8530 | 0.1619 | 0.2745 | 0.2318 |
| SDNE | 0.8693 | 0.1872 | 0.3012 | 0.2629 |
| DMGAT | 0.8928 | 0.8961 | 0.9500 | 0.8267 |
Numbers in bold indicate the best performance.
Ablation study
Impact of each module
To better evaluate the improvement of each module in our proposed model, we analyze the contributions of various components in the DMGAT model by evaluating versions with specific modules removed. The full DMGAT model achieves the highest performance in terms of AUC and AUPR. When the encoding module was replaced from word2vec to one-hot, TF-IDF (term frequency-inverse document frequency) and BoW (bag-of-words), there is a slight drop in performance. TF-IDF is a numerical statistic that reflects the importance of a word within a document relative to a corpus, by considering its frequency in the document and its rarity across the entire dataset . BoW is a text representation method that models a document as an unordered set of words, disregarding grammar and word order while preserving word frequency information. When the diffusion map feature extraction module is removed, the model’s performance deteriorated. When the GCN module is removed, the similarity matrix between ncRNA and drug is filled into the diagonal position of the adjacency matrix of GAT’s heterogeneous graph, representing the transition of ncRNA and drug with themselves. For the model without a GAT module, directly multiply the features of ncRNA and drug after passing through graph convolution network to obtain the associated prediction score. The GAT module appears to be the most critical component, as its removal causes the largest drop in performance.
While the GCN and diffusion map also contribute to the model’s effectiveness, their impact is less pronounced compared to GAT. The full DMGAT model benefits from the interplay of all three components, yielding the highest scores across all evaluation metrics. As shown in Table 4, the best performance is achieved by combining the three modules.
Table 4.
Performance comparison among removing different modules of DMGAT
| Methods | AUC | AUPR | Recall | F1-score |
|---|---|---|---|---|
| DMGAT (one-hot) | 0.8741 | 0.8778 | 0.9577 | 0.8031 |
| DMGAT (TF-IDF) | 0.8629 | 0.8668 | 0.9557 | 0.7952 |
| DMGAT (BoW) | 0.8534 | 0.8605 | 0.9587 | 0.7855 |
| DMGAT (no diffusion map) | 0.8378 | 0.8495 | 0.9633 | 0.7706 |
| DMGAT (no GCN) | 0.8311 | 0.8416 | 0.9655 | 0.7641 |
| DMGAT (no GAT) | 0.8211 | 0.8442 | 0.9533 | 0.7776 |
| DMGAT | 0.8964 | 0.8984 | 0.9576 | 0.8285 |
Numbers in bold indicate the best performance.
Impact of the number of GCN and GAT layers
To obtain the optimal number of GCN and GAT layers, we iterate the number of GCN and GAT layers from 1 to 5. Fig. 2 shows the AUC and AUPR values of different combination of the number of GCN and GAT layers. The AUC value reaches its peak at 0.8964 when the numbers of GCN and GAT are 2 and 4, respectively, while the AUPR value remains relatively high at 0.8984. So, we choose the model with two layers GCN and four GAT layers.
Figure 2.
(a) AUC values of DMGAT with different number of GCN and GAT layers, (b) AUPR values of DMGAT with different number of GCN and GAT layers.
Case study
In order to verify if the model can accurately predict new ncRNA-drug associations in the dataset, we used all data to train the parameter and made prediction score for all ncRNA-drug pairs. We selected the top 20 associations with the highest predicted scores of all unlabeled associations. As shown in Table 5, it can be seen that 17 out of the 20 top associations have already been experimentally verified. For instance, miR-26b enhances gemcitabine resistance in the pancreatic cancer cell line PANC-1 by inhibiting p53 expression through targeting the 3UTR of the p53 gene [52]. Similarly, MiR-195 down-regulation promotes 5-fluorouracil (5-FU) resistance in gastric cancer by upregulating HMGA1 expression, thereby contributing to acquired drug resistance [53]. Moreover, Let-7c is down-regulated in gemcitabine-resistant pancreatic cancer cells, contributing to the acquisition of epithelial-to-mesenchymal transition (EMT) characteristics. Restoring let-7c expression can reverse EMT and improve gemcitabine sensitivity, highlighting its potential as a therapeutic target for overcoming drug resistance [54]. Furthermore, MiR-134 is downregulated in cisplatin-resistant lung adenocarcinoma cells, contributing to multidrug resistance (MDR) by targeting forkhead box M1 and multidrug resistance-associated protein 1. Restoring miR-134 expression may improve cisplatin sensitivity and help to overcome MDR in lung adenocarcinoma [55].
Table 5.
The top 20 predicted scores of ncRNA-drug resistance associations
| ncRNA | Drug | Evidence |
|---|---|---|
| miR-26b | Gemcitabine | 23799850 |
| miR-195 | 5-Fluorouracil | 31115003 |
| miR-26a | Gemcitabine | 39288140 |
| miR-200a | Gemcitabine | 19654291 |
| miR-365 | Doxorubicin | Unconfirmed |
| miR-30a | Cisplatin | 27212164 |
| miR-122 | Gemcitabine | 31733293 |
| miR-654-5p | Doxorubicin | 32329825 |
| miR-423 | Doxorubicin | 30344696 |
| let-7c | Gemcitabine | 19654291 |
| miR-768 | Doxorubicin | 32714393 |
| miR-4454 | Doxorubicin | 34777698 |
| miR-23b | Doxorubicin | 34216852 |
| miR-133b | Gemcitabine | Unconfirmed |
| miR-519d | 5-Fluorouracil | 29771440 |
| miR-216 | Gemcitabine | Unconfirmed |
| miR-196a-5p | 5-Fluorouracil | 33130965 |
| miR-146b | 5-Fluorouracil | 29048680 |
| miR-134 | Cisplatin | 28454276 |
| miR-449a | Cisplatin | 24248414 |
Discussion
To validate the biological reasonableness of feature extraction in our model, we calculated the cosine similarity of features among ncRNAs associated with the same drug. Each drug is associated with several ncRNAs, and we computed the average cosine similarity between these ncRNAs. A higher value indicates that ncRNAs associated with this drug have more similar features.
The distribution of these values for all drugs is shown in Fig. 3. The first column represents the similarity distribution of only known associated ncRNAs, the second column shows the similarity distribution of only predicted associated ncRNAs, and the third column displays the similarity distribution of randomly assigned ncRNAs to drugs, these associations may contain some false associations.
Figure 3.

Distribution of the mean of cosine similarity between ncRNAs associated with the same drug.
From Fig. 3, the first column has the highest average value, indicating that ncRNAs associated with a drug have very similar features. The second column’s average value is greater than the third column, showing that the features of predicted associated ncRNAs are closer to each other than randomly selected ncRNAs. This demonstrates that the model can effectively distinguish ncRNAs associated with different drugs in the feature space.
The current large-scale validation of case studies is a challenging issue. We have made efforts to validate our method and prediction results from three perspectives.
Computational evaluation: We constructed a reliable negative sample set and applied computational metrics such as AUC and AUPR to demonstrate the superiority of our method over existing approaches (as shown in Tables 1, 2 and 3).
Feature similarity analysis: We introduced a new metric, where we assessed that ncRNAs with similar embedding features should be associated with similar drugs (illustrated in Fig. 3).
Experimental verification: We selected 20 predicted associations not present in the database, 17 of which were subsequently confirmed by other researchers through experimental validation. The number of confirmed predicted associations is higher than that of other methods (as shown in Table 5).
While we have made significant efforts to validate our approach computationally, biological experimental validation remains essential. Most studies analyze the relationship between ncRNA and drug resistance through specific biological models in vivo and in vitro. We look forward to further biological validation of the 3 associations from the top 20 predictions that have not yet been experimentally verified.
Regarding the concern about the difficulty of processing and validating large-scale data, it is indeed not a linear process. As the number of ncRNAs and drugs increases, the computational power and time required for processing and verification also grow significantly. Fortunately, research on ncRNA-drug associations is not yet as mature as that on drug–drug or drug–protein interactions, and the current data scale is still relatively small. In the future, as the data grow, we may consider using techniques such as knowledge distillation to reduce computational overhead.
Our work focuses specifically on predicting ncRNA–drug associations, we did not include ncRNA–disease or drug–disease interaction data in this study. However, there remains uncertainty about the therapeutic relevance of the predictions without the disease context, especially in distinguishing true targeted treatment effects from secondary or off-target associations. For example, BC200 lncRNA is overexpressed in colorectal cancer cells and is located adjacent to the oncogene EpCAM. BC200 RNA and EpCAM are involved in cell migration and invasion. A drug targeting BC200 might influence EpCAM activity, leading to effects that could be either therapeutic or off-target, depending on the context. [56] Integrating disease-level associations is crucial for understanding how ncRNA-drug interactions can translate into clinical outcomes.
However, bridging this gap requires establishing causal links between ncRNAs, drugs, and the biological processes of diseases, and confirming these relationships through large-scale validation. Incorporating additional information under conditions of insufficient disease-related data can increase the complexity of a multimodal model, potentially leading to adverse or counterproductive effects. And also, drug sensitivity and drug resistance prediction tasks also differ in how useful disease context is [57]. An ncRNA’s association with a disease may reflect many disease-related processes or biomarkers unrelated to drug action, so including that data can mislead the model away from true resistance pathways.
Based on the above reasons, we chose not to incorporate disease-level information into our current prediction model. However, if these issues can be addressed in the future, disease information could play a significant supporting role in drug screening and medical diagnosis.
Conclusion
In this study, a deep learning model called DMGAT using diffusion map was proposed to predict the possibility of ncRNA-drug associations. This model integrated heterogeneous convolution network with heterogeneous graph attention neural network to extract potential features of ncRNA and drug. Additionally, to tackle the imbalance and sparsity problem of dataset, the sensitivity between ncRNAs and drugs are introduced to select reliable negatives.
The performance of DMGAT was evaluated through 5-fold cross-validation on the dataset organized from NoncoRNA and ncDR. DMGAT performs better than the other seven state-of-the-art methods on metrics in terms of AUC, AUPR, recall, and F1-score. Ablation experiments show that all module is essential for achieving that good performance. Then, we conducted a case study on the top 20 associations predicted, which successfully identified differentially expressed ncRNAs from the literature, demonstrating the model’s capability to predict potential correlations and offering valuable insights for future experimental validation.
However, there are still some limitations in our approach. First, the use of word2vec and diffusion map for sequence embedding, while effective, can be computationally intensive for large datasets. As the amount of ncRNA and drug data continues to grow rapidly, this technique may face challenges when dealing with future massive datasets. Second, the attention mechanism in our graph attention network requires a large number of parameters. When dealing with extensive data, this could lead to significant memory usage and increased computational demands, potentially resulting in longer training and inference times. Additionally, our current model only considers the associations between ncRNAs and drugs. In the future, we aim to expand this model by incorporating other relevant information, such as ncRNA-disease associations and drug-disease interactions. This expansion could lead to the development of a more comprehensive model capable of simultaneously predicting the relationships among ncRNAs, drugs, and diseases, thereby providing a more holistic view of these complex biological interactions.
Key Points
We use word2vec and diffusion map to embed the non-coding RNA (ncRNA) sequence and drug simplified molecular input line entry system, which not only includes information of the entire sequence but also information of subsequences.
Sensitivity associations is applied to select reliable negative associations using a random forest classifier for addressing the problem of imbalanced dataset.
Attention mechanism of graph network is applied to integrate the feature of ncRNAs and drugs, which can integrate their information throughly.
The experimental results on benchmark datasets show that diffusion map and heterogeneous graph attention network (DMGAT) outperforms other seven state-of-the art approaches. Case study shows DMGAT has the potential to identify new ncRNA-drug resistance pairs.
Acknowledgments
We would like to thank the anonymous reviewers for their valuable suggestions.
Contributor Information
Tingyu Liu, School of Medicine and Heath, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, China.
Qiuhao Chen, Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.
Renjie Liu, Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.
Yuzhi Sun, School of Computer Science and Technology, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.
Yadong Wang, School of Computer Science and Technology, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.
Yan Zhu, College of Veterinary Medicine, Northeast Agricultural University, 150038, Xiangfang District, Changjiang Road No. 600, Harbin, China.
Tianyi Zhao, School of Medicine and Heath, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, China; Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.
Conflict of interest: None declared.
Funding
National key R&D plan(2022YFF1202101).
Data availability
The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.
References
- 1. Wang L, Xuan Z, Zhou S. et al. A novel model for predicting lncRNA-disease associations based on the lncRNA-miRNA-disease interactive network. Curr Bioinform 2019;14:269–78. 10.2174/1574893613666180703105258 [DOI] [Google Scholar]
- 2. Fan S-B, Xie X-F, Wei W. et al. Senescence-related lncRNAs: pioneering indicators for ovarian cancer outcomes. Phenomics 2024;4:379–93. 10.1007/s43657-024-00163-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ji Q, Jiang X, Wang M. et al. Multimodal omics approaches to aging and age-related diseases. Phenomics 2024;4:56–71. 10.1007/s43657-023-00125-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Treiber T, Treiber N, Meister G. Regulation of microRNA biogenesis and its crosstalk with other cellular pathways. Nat Rev Mol Cell Biol 2019;20:5–20. 10.1038/s41580-018-0059-1 [DOI] [PubMed] [Google Scholar]
- 5. Zhao P, Chang J, Chen YK. et al. Cellular senescence-related long non-coding RNA signatures predict prognosis in juvenile osteosarcoma. Phenomics 2024;4:430–52. 10.1007/s43657-023-00132-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Jeck WR, Sorrentino JA, Wang K. et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA (New York, NY) 2013;19:141–57. 10.1261/rna.035667.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kristensen LS, Hansen TB, Venø MT. et al. Circular RNAs in cancer: opportunities and challenges in the field. Oncogene 2018;37:555–65. 10.1038/onc.2017.361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Kristensen LS, Andersen MS, Stagsted LVW. et al. The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet 2019;20:675–91. 10.1038/s41576-019-0158-7 [DOI] [PubMed] [Google Scholar]
- 9. Lin R, Zheng S, Haiyu S. et al. Integrated transcriptome analysis of lncRNA, miRNA, and mRNA reveals key regulatory modules for polycystic ovary syndrome. Phenomics 2024;4:570–83. 10.1007/s43657-024-00183-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Seto AG, Kingston RE, Lau NC. The coming of age for piwi proteins. Mol Cell 2007;26:603–9. 10.1016/j.molcel.2007.05.021 [DOI] [PubMed] [Google Scholar]
- 11. Zhou J, Zhou W, Zhang R. The potential mechanisms of piRNA to induce hepatocellular carcinoma in human. Med Hypotheses 2021;146:110400. 10.1016/j.mehy.2020.110400 [DOI] [PubMed] [Google Scholar]
- 12. Xie X-f, Xiao-qian H, Liu D-x. et al. Identification of a novel pyroptosis-related lncRNAs prognosis model and subtypes in ovarian cancer. Phenomics 2025. 10.1007/s43657-024-00173-x [DOI] [Google Scholar]
- 13. Witusik-Perkowska M, Zakrzewska M, Jaskolski DJ et al. Artificial microenvironment of in vitro glioblastoma cell cultures changes profile of miRNAs related to tumor drug resistance. Onco Targets Ther 2019;12:3905–18. 10.2147/OTT.S190601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Cui T, Bell EH, McElroy J. et al. A novel miR-146a-POU3F2/SMARCA5 pathway regulates stemness and therapeutic response in glioblastoma. Mol Cancer Res 2021;19:48–60. 10.1158/1541-7786.MCR-20-0353 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lanlin H, Liang Y, Kelv W. et al. Repressing PDCD4 activates JNK/ABCG2 pathway to induce chemoresistance to fluorouracil in colorectal cancer cells. Ann Transl Med 2021;9:114. 10.21037/atm-20-4292 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Dai E, Yang F, Jing Wang X. et al. ncDR: a comprehensive resource of non-coding RNAs involved in drug resistance. Bioinformatics (Oxford, England) 2017;33:4010–1. 10.1093/bioinformatics/btx523 [DOI] [PubMed] [Google Scholar]
- 17. Li Y, Wang R, Zhang S. et al. Lrgcpnd: predicting associations between ncrna and drug resistance via linear residual graph convolution. Int J Mol Sci 2021;22:10508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Zhang P, Wang Z, Sun W. et al. RDRGSE: a framework for noncoding RNA-drug resistance discovery by incorporating graph skeleton extraction and attentional feature fusion. ACS Omega 2023;8:27386–97. 10.1021/acsomega.3c02763 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zheng J, Qian Y, He J. et al. Graph neural network with self-supervised learning for noncoding rna–drug resistance association prediction. J Chem Inf Model 2022;62:3676–84. 10.1021/acs.jcim.2c00367 [DOI] [PubMed] [Google Scholar]
- 20. Li L, Pengfei W, Wang Z. et al. NoncoRNA: a database of experimentally supported non-coding RNAs and drug targets in cancer. J Hematol Oncol 2020;13:1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Xinyu Cao X, Zhou FH, Huang Y-e. et al. ncRNADrug: a database for validated and predicted ncRNAs associated with drug resistance and targeted by drugs. Nucleic Acids Res 2024;52:D1393–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Le NQK. Predicting emerging drug interactions using GNNs. Nat Comput Sci 2023;3:1007–8. 10.1038/s43588-023-00555-7 [DOI] [PubMed] [Google Scholar]
- 23. Zhao Z, Gui J, Yao A. et al. Improved prediction model of protein and peptide toxicity by integrating channel attention into a convolutional neural network and gated recurrent units. ACS Omega 2022;7:40569–77. 10.1021/acsomega.2c05881 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Luo J, Long Y. NTSHMDA: prediction of human microbe-disease association based on random walk by integrating network topological similarity. IEEE/ACM Trans Comput Biol Bioinform 2018;17:1341–51. [DOI] [PubMed] [Google Scholar]
- 25. Chen X, Huang Y-A, You Z-H. et al. A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics (Oxford, England) 2017;33:733–9. 10.1093/bioinformatics/btw715 [DOI] [PubMed] [Google Scholar]
- 26. Zeng M, Chengqian L, Zhang F. et al. SDLDA: lncRNA-disease association prediction based on singular value decomposition and deep learning. Methods (San Diego, Calif) 2020;179:73–80. [DOI] [PubMed] [Google Scholar]
- 27. Deepthi K, Jereesh AS. Inferring potential circRNA–disease associations via deep autoencoder-based classification. Mol Diagn Ther 2021;25:87–97. 10.1007/s40291-020-00499-y [DOI] [PubMed] [Google Scholar]
- 28. Peng L-H, Yin J, Zhou L. et al. Human microbe-disease association prediction based on adaptive boosting. Front Microbiol 2018;9:2440. 10.3389/fmicb.2018.02440 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Liu Y, Wang S-L, Zhang J-F. et al. Dmfmda: prediction of microbe-disease associations based on deep matrix factorization using bayesian personalized ranking. IEEE/ACM Trans Comput Biol Bioinform 2020;18:1763–72. [DOI] [PubMed] [Google Scholar]
- 30. Liu Y, Wang S-L, Zhang J-F. et al. DMFMDA: prediction of microbe-disease associations based on deep matrix factorization using Bayesian personalized ranking. IEEE/ACM Trans Comput Biol Bioinform 2021;18:1763–72. 10.1109/TCBB.2020.3018138 [DOI] [PubMed] [Google Scholar]
- 31. Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 1988;28:31–6. 10.1021/ci00057a005 [DOI] [Google Scholar]
- 32. Mikolov T. Efficient estimation of word representations in vector space. arXiv preprint, arXiv:1301.3781. 2013.
- 33. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, Malta: University of Malta, 2010, p. 46–50. ISBN 2-9517408-6-7.
- 34. Li Z, Liu L, Feng C. et al. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res 2023;51:D186–91. 10.1093/nar/gkac999 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res 2019;47:D155–62. 10.1093/nar/gky1141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Knox C, Wilson M, Klinger CM. et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res 2024;52:D1265–75. 10.1093/nar/gkad976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Van Laarhoven, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics (Oxford, England) 2011;27:3036–43. 10.1093/bioinformatics/btr500 [DOI] [PubMed] [Google Scholar]
- 38. Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics (Oxford, England) 2015;31:2989–98. 10.1093/bioinformatics/btv325 [DOI] [PubMed] [Google Scholar]
- 39. Nestorowa S, Hamey FK, Sala BP. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood, J Am Soc Hematol 2016;128:e20–31. 10.1182/blood-2016-05-716480 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Zhang H, Ming Z, Fan C. et al. A path-based computational model for long non-coding RNA-protein interaction prediction. Genomics 2020;112:1754–60. 10.1016/j.ygeno.2019.09.018 [DOI] [PubMed] [Google Scholar]
- 41. Liu L, Zhou Y, Lei X. RMDGCN: prediction of RNA methylation and disease associations based on graph convolutional network with attention mechanism. PLoS Comput Biol 2023;19:e1011677. 10.1371/journal.pcbi.1011677 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Zhang J, Hu X, Jiang Z. et al. Predicting disease-related RNA associations based on graph convolutional attention network. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 177–82. IEEE. [Google Scholar]
- 43. Mudiyanselage TB, Lei X, Senanayake N. et al. Predicting CircRNA disease associations using novel node classification and link prediction models on graph convolutional networks. Methods (San Diego, Calif) 2022;198:32–44. 10.1016/j.ymeth.2021.10.008 [DOI] [PubMed] [Google Scholar]
- 44. Honglin S, Gao H. PDA-GCN: predicting piwi-interacting RNA-disease associations based on graph convolution network. In: 2023 11th International Conference on Bioinformatics and Computational Biology (ICBCB), pp. 118–22. IEEE. [Google Scholar]
- 45. Liu Y, Zhang F, Ding Y. et al. MRDPDA: a multi-Laplacian regularized deepFM model for predicting piRNA-disease associations. J Cell Mol Med 2024;28:e70046. 10.1111/jcmm.70046 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inform Process Syst 2016;29. [Google Scholar]
- 47. Bian C, Lei X-J, Fang-Xiang W. GATCDA: predicting circRNA-disease associations based on graph attention network. Cancer 2021;13:2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Cao R, He C, Wei P. et al. Prediction of circRNA-disease associations based on the combination of multi-head graph attention network and graph convolutional network. Biomolecules 2022;12:932. 10.3390/biom12070932 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Veličković P, Cucurull G, Casanova A. et al. Graph attention networks. arXiv preprint, arXiv:1710.10903. 2017.
- 50. Liu Y, Min W, Miao C. et al. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput Biol 2016;12:e1004760. 10.1371/journal.pcbi.1004760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Niu Y, Song C, Gong Y. et al. MiRNA-drug resistance association prediction through the attentive multimodal graph convolutional network. Front Pharmacol 2022;12:799108. 10.3389/fphar.2021.799108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Wang Y, Yang L, Chang Y. MiR-26b enhances drug resistance of pancreatic cancer cells to gemcitabine by inhibiting p 53 gene expression. Int J Lab Med 2017;3073–6. [Google Scholar]
- 53. Wang C-Q. MiR-195 reverses 5-FU resistance through targeting HMGA1 in gastric cancer cells. Eur Rev Med Pharmacol Sci 2019;23. [DOI] [PubMed] [Google Scholar]
- 54. Li Y, VandenBoom TG, Kong D. et al. Up-regulation of miR-200 and let-7 by natural agents leads to the reversal of epithelial-to-mesenchymal transition in gemcitabine-resistant pancreatic cancer cells. Cancer Res 2009;69:6704–12. 10.1158/0008-5472.CAN-09-1298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Li J, Chen Y, Jin M. et al. MicroRNA-134 reverses multidrug resistance in human lung adenocarcinoma cells by targeting FOXM1. Oncol Lett 2017;13:1451–5. 10.3892/ol.2017.5574 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Cao X-G, Zhao R, Zhu C. et al. BC200 LncRNA a potential predictive marker of poor prognosis in esophageal squamous cell carcinoma patients. Onco Targets Ther 2016;2221. 10.2147/OTT.S99401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Mégret L, Mendoza C, Lobo MA. et al. Precision machine learning to understand micro-RNA regulation in neurodegenerative diseases. Front Mol Neurosci 2022;15:914830. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.


















