Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Apr 19;26(2):bbaf179. doi: 10.1093/bib/bbaf179

DMGAT: predicting ncRNA-drug resistance associations based on diffusion map and heterogeneous graph attention network

Tingyu Liu 1,b, Qiuhao Chen 2,b, Renjie Liu 3, Yuzhi Sun 4, Yadong Wang 5, Yan Zhu 6,, Tianyi Zhao 7,8,
PMCID: PMC12008124  PMID: 40251829

Abstract

Non-coding RNAs (ncRNAs) play crucial roles in drug resistance and sensitivity, making them important biomarkers and therapeutic targets. However, predicting ncRNA-drug associations is challenging due to issues such as dataset imbalance and sparsity, limiting the identification of robust biomarkers. Existing models often fall short in capturing local and global sequence information, limiting the reliability of predictions. This study introduces DMGAT (diffusion map and heterogeneous graph attention network), a novel deep learning model designed to predict ncRNA-drug associations. DMGAT integrates diffusion maps for sequence embedding, graph convolutional networks for feature extraction, and GAT for heterogeneous information fusion. To address dataset imbalance, the model incorporates sensitivity associations and employs a random forest classifier to select reliable negative samples. DMGAT embeds ncRNA sequences and drug SMILES using the word2vec technique, capturing local and global sequence information. The model constructs a heterogeneous network by combining sequence similarity and Gaussian Interaction Profile kernel similarity, providing a comprehensive representation of ncRNA-drug interactions. Evaluated through five-fold cross-validation on a curated dataset from NoncoRNA and ncDR, DMGAT outperforms seven state-of-the-art methods, achieving the highest area under the receiver operating characteristic curve (0.8964), area under the precision-recall curve (0.8984), recall (0.9576), and F1-score (0.8285). The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.

Keywords: ncRNA-drug resistance association prediction, diffusion map, graph convolution network, graph attention network

Introduction

Non-coding RNAs (ncRNAs) play important roles in cells’ development, differentiation, and apoptosis processing. Numerous studies have shown that ncRNAs are extensively involved in human pathological pathways. As biomarkers, they offer new targets for treating diseases such as cancer [1–3]. Non-coding RNAs, such as microRNAs (miRNAs), circular RNAs (circRNAs), long non-coding RNAs (lncRNAs), and Piwi-interacting RNAs (piRNAs), have garnered significant interest from researchers. Recent studies suggest that ncRNAs are involved in various aspects of tumor cell drug resistance, including epithelial–mesenchymal transition, DNA repair, drug efflux and metabolism, and cell cycle progression, highlighting the potential significance of ncRNAs in drug therapy [4, 5]. Different ncRNAs have different functions according to their length and structure. miRNAs are short regulatory biomolecules involved in post-transcriptional regulation of gene expression. Compared to linear miRNAs, circRNAs [6] are more stable and may function as carriers or scaffolds [7]. They regulate protein function by acting as microRNA or protein inhibitors, or they can be translated to perform important biological functions [8, 9]. lncRNAs can play a role in regulating synergistic proteins. In contrast, piRNAs have been studied relatively less. Extending across 24–35 nucleotides [10–12], piRNAs interact with proteins of the Piwi subfamily and are essential for the suppression of transposable elements, protection of the genome, and histone modification, among other roles, influencing gene expression and primarily suppressing transposon activity. There are synergistic interactions between different RNAs. For example, lncRNAs can act as molecular sponges for miRNAs, regulating the expression of their target genes.

The increasing volume of studies connecting abnormal ncRNA expression to various drugs highlights the potential of these ncRNAs as valuable diagnostic markers and therapeutic targets. More specifically, the antagonistic interactions between certain ncRNAs and specific types of drugs can affect protein expression, which in turn influences human metabolism. For example, miR-181a interacts with drugs by modulating molecular pathways involved in glioblastoma drug resistance, potentially altering the tumor’s responsiveness to treatments [13]. One study shows miR-181a influences drug efficacy by regulating key molecular mechanisms related to glioblastoma drug resistance, potentially affecting treatment outcomes [14]. miR-181a modulates drug resistance in glioblastoma by influencing molecular pathways that control tumor sensitivity to chemotherapy, potentially impacting treatment success [15].

With the decreasing cost and advancements in sequencing technology, many databases now offer a wealth of resistance and sensitivity associations between ncRNAs and drugs, including experimentally validated and predicted associations. ncDR [16] database has collected 5864 experimental verified resistance relationships between 1039 ncRNAs (162 lncRNAs and 877 miRNAs) and 145 compounds from around 900 published studies. It used in a lot of ncRNAs association prediction studies such as LRGCPND [17], RDRGSE [18] and GSLRDA [19]. NoncoRNA [20] provide an all-encompassing database for searching drug resistance/sensitivity-related ncRNAs in different human cancers. ncRNADrug [21] is a comprehensive and integrated database that collected manually curated and predicted resistance/sensitivity associations between ncRNAs and drugs. These databases provide a wealth of knowledge for understanding the associations between ncRNAs and drugs in humans.

To date, numerous ncRNA-drug associations have been validated through experimental biological methods. However, progress in this field has been constrained by the significant time and resource investments required for such validation [22, 23]. In response to these limitations, researchers have developed various computational algorithms to efficiently explore and map ncRNA-drug networks. For example, the NTSHMDA [24] model constructs a heterogeneous network integrating disease and microbe similarities with known associations, using a weighted random walk algorithm based on network topology. In KATZHMDA [25], a heterogeneous network combines multisource miRNA and disease similarity networks with known associations, predicting links via the KATZ algorithm. SDLDA [26] applies singular value decomposition and deep learning to predict lncRNA-disease associations. AE-RF [27] combines deep autoencoder and random forest for circRNA-disease predictions. ABHMDA [28] uses k-means clustering to select reliable negative samples and an adaptive boosting classifier for microbe-disease predictions. DMFCDA [29] employs deep matrix decomposition with a fully connected projection layer to extract latent circRNA-disease features, feeding them into a neural network. In DMFMDA [30], one-hot encoded microbe and disease data is transformed to low-dimensional vectors through an embedding layer, with predictions made via matrix decomposition in a neural network.

Although the mentioned methods have demonstrated good performance, there is still a room to enhance the processing of mining the feature of ncRNAs and drugs using more comprehensive information. On one hand, current methods like NTSHMDA and DMFMDA, do not take into account the sequence information of RNA and drugs, relying solely on association matrices to derive features. This approach loses a lot of valuable information contained within the sequences. On the other hand, existing models focus only on resistance while treating all other associations as unknown. This means that all unknown associations are excluded from the parameter iteration process, overlooking the presence of sensitivity associations among them.

To overcome these limitations, we introduced the diffusion map and heteregeneous graph attention network (DMGAT) model, which incorporates an attention-based graph convolutional network along with sensitivity associations in diffused space using diffusion map. Diffusion map can make ncRNA or drug features more continuous in the manifold space, while GAT graph neural network utilizes attention mechanisms to fully capture the association features between ncRNA and drugs. The key contributions of our work are outlined as follows:

  • We introduce DMGAT, a novel deep learning model that integrates diffusion maps with graph convolutional and attention networks for accurate prediction of ncRNA-drug associations.

  • DMGAT employs the word2vec technique to embed ncRNA sequences and drug SMILES while constructing a heterogeneous network that combines sequence and Gaussian interaction profile similarities.

  • Our approach tackles data imbalance and sparsity by incorporating sensitivity associations and using a random forest classifier to select reliable negative samples.

Our source code and data are available on github (https://github.com/liutingyu0616/DMGAT/tree/main).

Materials

Dataset

The manually curated ncRNA-drug associations datasets NoncoRNA [20], ncDR [16] and ncRNADrug [21] are collected as benchmark dataset used in our work. We used the resistance associations in NoncoRNA and ncDR dataset, the sensitivity associations are from ncRNADrug dataset.

  • NoncoRNA

    NoncoRNA: containing 8233 ncRNA-drug resistance associations between 5568 ncRNAs and 154 drugs in 134 cancers [20]. The 2020 February version is used in our work, which is released at http://www.ncdtcdb.cn:8080/NoncoRNA.

  • ncDR

    ncDR is an aggregated dataset that contains manually curated verified and predicted ncRNA-drug associations. We used 2016 June version of ncDR dataset, which involves 145 drugs and 1039 ncRNAs (877 miRNAs and 162 lncRNAs) from around 900 published literatures [16]. The dataset is public releases at https://www.mdpi.com/1422-0067/22/19/10508.

  • ncRNADrug

    It comprises ncRNAs linked to drug resistance, along with ncRNAs that are drug targets, experimentally confirmed and computationally predicted. Regarding experimentally validated entries, ncRNADrug includes 29 551 resistance records between 9195 ncRNAs (2248 miRNAs, 4145 lncRNAs, and 2802 circRNAs) and 266 drugs [21]. Additionally, it contains 32 969 records involving 10 480 ncRNAs (4338 miRNAs, 6087 lncRNAs, and 55 circRNAs) that are targeted by 965 drugs.

We only selected experiment verified associations from those three datasets. We refined the dataset by eliminating redundant and ambiguous associations and ones where a lncRNA or miRNA is linked to only a single drug resistance binding from NoncoRNA and ncDR. Additionally, we choose the sensitivity associations and removing the redundant associations from ncRNADrug dataset. After data preprocessing, we got 2693 resistance associations and 408 sensitivity associations between 622 ncRNAs (41 lncRNAs and 581 miRNAs) and 121 drugs. The dataset can be denoted as,

graphic file with name DmEquation1.gif (1)

where Inline graphic represents the resistance associations, which contains 2689 ncRNA-drug resistance entries, Inline graphic represents the sensitivity associations, which contains 408 ncRNA-drug sensitivity associations. Inline graphic represents the associations which not be verified experimentally, the number of Inline graphic associations is 72576. Known associations account for 3.7% of the total associations.

Methodology

We proposed a ncRNA-drug association predictor DMGAT based on graph attention network and graph attention network in diffusion map space. Random forest classifier and sensitivity associations were introduced to select reliable negative association while training to tackle imbalance dataset problem. the workflow of DMGAT is shown in Fig. 1. There are three mainly steps, (i) embedding ncRNA sequence and drug SMILES using word2vec technique, using diffusion map to reduce the dimension, computing the similarity in diffusion map space. (ii) Extracting the ncRNA and drug feature using graph convolution network, respectively. (iii) Predicting association using graph attention network.

Figure 1.

Figure 1

The flowchart of DMGAT. (a) ncRNA and drug node embedding and similarity calculation. Split sequence of ncRNA and SMILES of drug into several 3-mers, and then replaced by their corresponding feature vectors derived from word2vec. Flatten the feature to 2D using a deep learning network. Using diffusion map to reduce the dimension of feature. Sequence similarity obtained by calculating the Euclid distance of their feature. (b) Feature extraction using GCN. Contracting two separate homogeneous graph convolution networks to extracting ncRNA and drug feature respectively based on their diffusion map feature and similarity. (c) Predicting resistance association score using graph attention network. Combining feature of ncRNA and drug based on attention mechanism, multiply two feature matrices to get the predicted adjacency matrix.

ncRNA sequence and drug SMILES embedding

In most previous studies, the combination of the row and column of adjacent matrix between ncRNAs and drugs was commonly regarded as the feature of each association. However, the adjacent matrix is sparse, which could not describe the difference of each ncRNA-drug resistance pair accurately. Meanwhile, it will lose the information implied in the sequence of ncRNAs and SMILES (simplified molecular input line entry system) of drugs [31]. SMILES is a notation used to represent the structure of drugs. It encodes the molecular structure of a compound by using specific characters to denote atoms, bonds, and connectivity.

Word2Vec encoding

To avoid the sparse problems of feature matrix, we introduced a widely used long sequence embedding technique in natural language model processing field, which is word2vec [32]. The development of word2vec technique is applied in the genism library [33], which utilizes continuous bags of words (CBOW) by default. We split each sequence into 3-mers, ncRNA sequences and drug SMILES, and used the feature of these 3-mers to encapsulate the whole sequence’s feature. This method not only can represent the information implied in local sequence, but also could consider the positional relationship between each local 3-mers, which could reflect the patterns inherent to the sequence and keep the order information.

The sequences information of lncRNAs are obtained from LncBook 2.0 (https://ngdc.cncb.ac.cn/lncbook/home) [34]. The sequences of miRNAs are obtained from miRbase (http://mirbase.org/) [35]. The SMILES of drugs are obtained from DrugBank (https://go.drugbank.com/), which is a “gold standard” knowledge base for drug-related information, including drug-target interactions and other pharmaceutical data [36].

After word2vec encoding, each feature of ncRNA and drug is a 2D array, so the feature of each type of ncRNA is a combination of many 2D arrays, that is a 3D array. For subsequent data processing, it is necessary to further encode the 3D array into a 2D array. Here, a fully connected neural network layer is used to reduce the features from 3D to 2D. The optimization direction of the loss function is to make the variance of the compressed 2D array as large as possible. This setting aims to make the features after linear compression more dispersed in the feature space and reflect differences between different ncRNAs and drugs.

Diffusion map

Given that RNA mutates only a few bases at a time, the characteristic of RNA may be continuously changing along a certain path. For a drug, it is assumed that drugs with similar SMILES have similar functional properties, so diffusion mapping method is used to compress the characteristics of ncRNAs and drugs. Diffusion maps [37] are a powerful tool for encoding long sequences by capturing the intrinsic geometric structure of high-dimensional data. This technique constructs a diffusion process based on the pairwise similarities between sequences, which allows for the preservation of both local and global relationships in the sequence of ncRNAs and drugs [38]. Unlike linear dimensionality reduction methods like PCA, diffusion maps excel at identifying non-linear patterns and having the ability to capture continuous changing features, which are critical in biological datasets where the relationships between features are often complex. The use of diffusion maps for sequence encoding ensures robustness to noise, preserves the biological manifold structure, and provides an effective means for dimensionality reduction, facilitating further downstream analysis [39]. The number of eigenvectors to compute in the diffusion map for lncRNA, miRNA, and drug are 9, 150, and 40, respectively, which are a quarter of their feature dimension approximately. Scale parameter epsilon detected by an algorithm by Berry, Harlim, and Giannakis. Normalization parameter alpha is set to 1, which ensures probability conservation, using ball tree algorithm for nearest neighbor search.

A full connected layer was used to extract the feature from diffusion map space in order to unify the dimension of each kind of ncRNA and drug.

ncRNA-drug adjacency network construction

The adjacency matrix shows if there is a resistance association between each ncRNA and drug. It can be denoted as,

graphic file with name DmEquation2.gif (2)

where Inline graphic is the number of ncRNAs, Inline graphic is the number of drugs. Inline graphic if there is a resistance association between the Inline graphicth ncRNA and the Inline graphicth drug, and Inline graphic otherwise.

ncRNA similarity matrix

The similarity between ncRNAs is obtained from the sequence of each ncRNA and the adjacency matrix, which can be denoted as,

graphic file with name DmEquation3.gif (3)

The sequence similarity is obtained by computing the Euclid distance between each ncRNA in diffusion map space, which can be represented as,

graphic file with name DmEquation4.gif (4)

where Inline graphic, Inline graphic are the Inline graphicth and Inline graphicth ncRNA, respectively.

The Gaussian interaction profile (GIP) kernel similarity has been widely utilized to assess the similarity between two nodes in the prediction of ncRNA-drug associations [38, 40]. This implies that ncRNAs with similar profiles tend to exhibit similar interaction patterns in drugs, and the reverse is also true. It captures the similarity between these entities by creating a Gaussian distribution of interaction profiles based on available interaction data. The GIP kernel similarity between Inline graphicth and Inline graphicth ncRNA is as follows [37],

graphic file with name DmEquation5.gif (5)

where Inline graphic and Inline graphic are the Inline graphicth and Inline graphicth row vector of the adjacency matrix Inline graphic, Inline graphic is the kernel width coefficient, which is defined as,

graphic file with name DmEquation6.gif (6)

where Inline graphic is the total number of lncRNAs and miRNAs, Inline graphic is the Inline graphicth row vector of the adjacency matrix Inline graphic.

The combined integrated similarity between ncRNAs is calculated by taking the average of the sequence similarity and GIP kernel similarity,

graphic file with name DmEquation7.gif (7)

Drug similarity matrix

The similarity between drugs is obtained from the SMILES of each drug and the adjacency matrix. The way of calculating the SMILES similarity between drugs is similar to that of ncRNAs, which can be denoted as,

graphic file with name DmEquation8.gif (8)

where Inline graphic, Inline graphic are the Inline graphicth and Inline graphicth drug, respectively.

The GIP kernel similarity between Inline graphicth and Inline graphicth drug is as follows:

graphic file with name DmEquation9.gif (9)

where Inline graphic and Inline graphic are the Inline graphicth and Inline graphicth column vector of the adjacency matrix Inline graphic, Inline graphic is the kernel width coefficient, which is defined as,

graphic file with name DmEquation10.gif (10)

where Inline graphic is the total number of drugs, Inline graphic is the Inline graphicth column vector of the adjacency matrix Inline graphic.

The combined integrated similarity between ncRNAs is calculated by taking the average of the sequence similarity and GIP kernel similarity,

graphic file with name DmEquation11.gif (11)

Feature extraction using GCN

GCN has been widely used in different nodes feature aggregation due to its ability to capture hidden graph relations and propagate information through the network [19, 41–45], which is a type of neural network capable of efficiently extracting features from networks by aggregating information from neighboring nodes. ncRNA and drug can be regarded as node in their own graph network, their similarity matrix is the feature transition matrix while propagating feature processing [46]. The GCN takes the ncRNA, drug feature and similarity matrix as input and performs graph convolution operations to fuse their hidden features.

In GCN, the output of the lth layer is treated as the input for the Inline graphicth layer to capture higher dimensional features. The node embedding at the Inline graphicth layer is given by:

graphic file with name DmEquation12.gif (12)

where Inline graphic is the integrated similarity matrix obtained by Equation 7 and 11 for ncRNA and drug respectively, Inline graphic is the degree matrix of Inline graphic, Inline graphic is the ncRNA embedding of the Inline graphicth layer, which is the concatenated feature by two kinds of ncRNAs. For the drug graph convolution network, Inline graphic is the drug feature obtained from diffusion map space. Inline graphic is the trainable parameter matrix. Inline graphic is the activation function ReLU. So that the feature of ncRNAs and drugs propagated by GCN are obtained, respectively. The number of GCN layer for ncRNA and drug is 2.

Feature extraction using GAT

The above homogeneous GCN only considers the similarity relationship between ncRNA and drug themselves, and the relationship between ncRNA and drug is integrated using GAT [47, 48]. GAT is a neural network designed to operate on graph-structured data by applying attention mechanisms to learn the importance of neighboring nodes [49]. In DMGAT, the GAT layer was used to aggregate the feature of ncRNAs and drugs based on the adjacency matrix to capture the high dimensional features of ncRNAs and drugs.

For the network GAT, the input is the ncRNA and drug feature obtained from GCN and the adjacency matrix between them. The output Inline graphicth layer was as follows,

graphic file with name DmEquation13.gif (13)

where Inline graphic is the non-linear activation function ReLU. Inline graphic represents the normalized attention coefficient between node Inline graphic and its neighbor node Inline graphic. It essentially measures the importance or relevance of node Inline graphic to node Inline graphic when aggregating information from its neighbors, which is defined as,

graphic file with name DmEquation14.gif (14)

where Inline graphic is the set of neighbors of node Inline graphic, Inline graphic is a learnable weight vector for computing attention, LR is the activation function LeakyReLU, Inline graphic is a weight matrix applied to the node features. RNA and drug obtain new integrated characteristics after GAT fusion. The number of GAT layer for ncRNA and drug is 4.

Selecting reliable negative associations

Due to reasons such as technology or cost of sequencing, there are a lot of associations between ncRNAs and drugs unfound yet. In many studies about ncRNA-drug associations prediction tasks using graph network, unlabeled associations are treated as negative samples while training [17–19]. Considering all unlabeled examples as negative instances may lead to bias in the learning process, as these samples do not accurately reflect the true negative class. This approach can distort the dataset’s distribution and result in poor performance of the trained model. As well as introduced the imbalanced problem. So, we introduced the sensitivity associations between ncRNAs and drugs as the negative samples. Because the negative samples must be associations that are not resistant, selecting sensitivity associations that have been experimentally verified as negative samples is the most reliable method for determining negative samples. Biological experiments ensure that these associations are definitely not resistant. However, the number of sensitivity associations is significantly less than the number of positive associations, approximately one-fifth of the latter. Therefore, we use a random forest classifier with resistance associations as positive samples and sensitivity associations as negative samples. We train a random forest classifier and select the sample with the lowest score as reliable negative sample to ensure that the known sensitivity associations and predicted sensitivity associations are equal in quantity to positive samples, thus solving the issue of imbalance.

Results

Performance evaluation

To evaluate the performance of DMGAT systematically and objectively, the five-fold cross validation was utilized while training. The experimental confirmed ncRNA-drug resistance association set is denoted as Inline graphic. The unlabelled association set is denoted as Inline graphic. The union of verified ncRNA-drug sensitivity association set and predicted reliable sensitivity association set is denoted as Inline graphic. They can be divided into five subsets with the same size as follows,

graphic file with name DmEquation15.gif (15)

where Inline graphic.

The training set and test set can be denoted as follows:

graphic file with name DmEquation16.gif (16)

where Inline graphic, Inline graphic is the notion of complement operation. It should be noted that the corresponding set for each fold is different, the GIP kernel similarity matrix Inline graphic and Inline graphic needs to be recalculated based on the current training set.

The area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AUPR), accuracy, precision, recall, and the F1-score are commonly employed to assess the performance of prediction models when addressing class imbalance problems.

Comparison with state-of-the-art methods

We compared the performance of DMGAT with seven state-of-the-art models on the same dataset, including NTSHMDA [24], KATZMDA [25], SDLDA [26], AE-RF [27], ABHMDA [28], DMFCDA [29], and DMFMDA [30].

The performance of these models is shown in Table 1. The DMGAT model demonstrates significant advantages in ncRNA-drug association prediction compared to other methods. It achieves the highest AUC (0.8964), AUPR (0.8984), recall (0.9576), and F1-score (0.8285), indicating superior predictive performance. These results suggest that DMGAT excels in overall prediction accuracy and its capacity to handle imbalanced data.

Table 1.

Performance comparison among different methods

Methods AUC AUPR Recall F1-score
NTSHMDA 0.7142 0.6391 0.4470 0.5289
KATZMDA 0.7544 0.8048 0.6223 0.6964
SDLDA 0.8258 0.8663 0.6978 0.7588
AE-RF 0.8390 0.8535 0.6127 0.6881
ABHMDA 0.8428 0.8516 0.8413 0.7756
DMFCDA 0.8449 0.8649 0.7463 0.7499
DMFMDA 0.8491 0.8546 0.8288 0.8269
DMGAT 0.8964 0.8984 0.9576 0.8285

Numbers in bold indicate the best performance.

Compared to existing models, our model uses sequence information from ncRNA and drugs to extract features, while most other models, such as NTSHMDA, DMFMDA, and SDLDA, use one-hot encoding, ignoring important sequence information. Additionally, sensitivity association helps select more reliable negative samples, effectively addressing the sample imbalance problem. Moreover, the diffusion map dimensionality reduction method enhances feature continuity in the manifold space, which benefits downstream learning. By leveraging these approaches, the GAT graph neural network can better utilize attention mechanisms to fully capture the association features between RNA and drugs, ultimately improving performance.

We also compared the prediction ability of lncRNA and miRNA associated with drug resistance separately with other models. Table 2 shows the prediction ability of DMGAT compared to several other state-of-the-art models for lncRNA and drug association. The data of the performance of other lncRNA-drug prediction models is from [50]. Table 3 shows the prediction ability for miRNA and drug association. The data of the performance of other miRNA-drug prediction models are from [51].

Table 2.

Performance of lncRNA-drug prediction comparison among different methods

Methods AUC AUPR
NetLapRLS 0.811 0.547
BLM-NI 0.784 0.560
KBMF2K 0.841 0.521
CMF 0.834 0.538
DMGAT 0.8397 0.8431

Numbers in bold indicate the best performance.

Table 3.

Performance of miRNA-drug prediction comparison among different methods

Methods AUC AUPR Recall F1-score
CF 0.8618 0.2046 0.3314 0.2873
LP 0.8610 0.2262 0.3176 0.3075
GF 0.8530 0.1619 0.2745 0.2318
SDNE 0.8693 0.1872 0.3012 0.2629
DMGAT 0.8928 0.8961 0.9500 0.8267

Numbers in bold indicate the best performance.

Ablation study

Impact of each module

To better evaluate the improvement of each module in our proposed model, we analyze the contributions of various components in the DMGAT model by evaluating versions with specific modules removed. The full DMGAT model achieves the highest performance in terms of AUC and AUPR. When the encoding module was replaced from word2vec to one-hot, TF-IDF (term frequency-inverse document frequency) and BoW (bag-of-words), there is a slight drop in performance. TF-IDF is a numerical statistic that reflects the importance of a word within a document relative to a corpus, by considering its frequency in the document and its rarity across the entire dataset . BoW is a text representation method that models a document as an unordered set of words, disregarding grammar and word order while preserving word frequency information. When the diffusion map feature extraction module is removed, the model’s performance deteriorated. When the GCN module is removed, the similarity matrix between ncRNA and drug is filled into the diagonal position of the adjacency matrix of GAT’s heterogeneous graph, representing the transition of ncRNA and drug with themselves. For the model without a GAT module, directly multiply the features of ncRNA and drug after passing through graph convolution network to obtain the associated prediction score. The GAT module appears to be the most critical component, as its removal causes the largest drop in performance.

While the GCN and diffusion map also contribute to the model’s effectiveness, their impact is less pronounced compared to GAT. The full DMGAT model benefits from the interplay of all three components, yielding the highest scores across all evaluation metrics. As shown in Table 4, the best performance is achieved by combining the three modules.

Table 4.

Performance comparison among removing different modules of DMGAT

Methods AUC AUPR Recall F1-score
DMGAT (one-hot) 0.8741 0.8778 0.9577 0.8031
DMGAT (TF-IDF) 0.8629 0.8668 0.9557 0.7952
DMGAT (BoW) 0.8534 0.8605 0.9587 0.7855
DMGAT (no diffusion map) 0.8378 0.8495 0.9633 0.7706
DMGAT (no GCN) 0.8311 0.8416 0.9655 0.7641
DMGAT (no GAT) 0.8211 0.8442 0.9533 0.7776
DMGAT 0.8964 0.8984 0.9576 0.8285

Numbers in bold indicate the best performance.

Impact of the number of GCN and GAT layers

To obtain the optimal number of GCN and GAT layers, we iterate the number of GCN and GAT layers from 1 to 5. Fig. 2 shows the AUC and AUPR values of different combination of the number of GCN and GAT layers. The AUC value reaches its peak at 0.8964 when the numbers of GCN and GAT are 2 and 4, respectively, while the AUPR value remains relatively high at 0.8984. So, we choose the model with two layers GCN and four GAT layers.

Figure 2.

Figure 2

(a) AUC values of DMGAT with different number of GCN and GAT layers, (b) AUPR values of DMGAT with different number of GCN and GAT layers.

Case study

In order to verify if the model can accurately predict new ncRNA-drug associations in the dataset, we used all data to train the parameter and made prediction score for all ncRNA-drug pairs. We selected the top 20 associations with the highest predicted scores of all unlabeled associations. As shown in Table 5, it can be seen that 17 out of the 20 top associations have already been experimentally verified. For instance, miR-26b enhances gemcitabine resistance in the pancreatic cancer cell line PANC-1 by inhibiting p53 expression through targeting the 3UTR of the p53 gene [52]. Similarly, MiR-195 down-regulation promotes 5-fluorouracil (5-FU) resistance in gastric cancer by upregulating HMGA1 expression, thereby contributing to acquired drug resistance [53]. Moreover, Let-7c is down-regulated in gemcitabine-resistant pancreatic cancer cells, contributing to the acquisition of epithelial-to-mesenchymal transition (EMT) characteristics. Restoring let-7c expression can reverse EMT and improve gemcitabine sensitivity, highlighting its potential as a therapeutic target for overcoming drug resistance [54]. Furthermore, MiR-134 is downregulated in cisplatin-resistant lung adenocarcinoma cells, contributing to multidrug resistance (MDR) by targeting forkhead box M1 and multidrug resistance-associated protein 1. Restoring miR-134 expression may improve cisplatin sensitivity and help to overcome MDR in lung adenocarcinoma [55].

Table 5.

The top 20 predicted scores of ncRNA-drug resistance associations

ncRNA Drug Evidence
miR-26b Gemcitabine 23799850
miR-195 5-Fluorouracil 31115003
miR-26a Gemcitabine 39288140
miR-200a Gemcitabine 19654291
miR-365 Doxorubicin Unconfirmed
miR-30a Cisplatin 27212164
miR-122 Gemcitabine 31733293
miR-654-5p Doxorubicin 32329825
miR-423 Doxorubicin 30344696
let-7c Gemcitabine 19654291
miR-768 Doxorubicin 32714393
miR-4454 Doxorubicin 34777698
miR-23b Doxorubicin 34216852
miR-133b Gemcitabine Unconfirmed
miR-519d 5-Fluorouracil 29771440
miR-216 Gemcitabine Unconfirmed
miR-196a-5p 5-Fluorouracil 33130965
miR-146b 5-Fluorouracil 29048680
miR-134 Cisplatin 28454276
miR-449a Cisplatin 24248414

Discussion

To validate the biological reasonableness of feature extraction in our model, we calculated the cosine similarity of features among ncRNAs associated with the same drug. Each drug is associated with several ncRNAs, and we computed the average cosine similarity between these ncRNAs. A higher value indicates that ncRNAs associated with this drug have more similar features.

The distribution of these values for all drugs is shown in Fig. 3. The first column represents the similarity distribution of only known associated ncRNAs, the second column shows the similarity distribution of only predicted associated ncRNAs, and the third column displays the similarity distribution of randomly assigned ncRNAs to drugs, these associations may contain some false associations.

Figure 3.

Figure 3

Distribution of the mean of cosine similarity between ncRNAs associated with the same drug.

From Fig. 3, the first column has the highest average value, indicating that ncRNAs associated with a drug have very similar features. The second column’s average value is greater than the third column, showing that the features of predicted associated ncRNAs are closer to each other than randomly selected ncRNAs. This demonstrates that the model can effectively distinguish ncRNAs associated with different drugs in the feature space.

The current large-scale validation of case studies is a challenging issue. We have made efforts to validate our method and prediction results from three perspectives.

  • Computational evaluation: We constructed a reliable negative sample set and applied computational metrics such as AUC and AUPR to demonstrate the superiority of our method over existing approaches (as shown in Tables 1, 2 and 3).

  • Feature similarity analysis: We introduced a new metric, where we assessed that ncRNAs with similar embedding features should be associated with similar drugs (illustrated in Fig. 3).

  • Experimental verification: We selected 20 predicted associations not present in the database, 17 of which were subsequently confirmed by other researchers through experimental validation. The number of confirmed predicted associations is higher than that of other methods (as shown in Table 5).

While we have made significant efforts to validate our approach computationally, biological experimental validation remains essential. Most studies analyze the relationship between ncRNA and drug resistance through specific biological models in vivo and in vitro. We look forward to further biological validation of the 3 associations from the top 20 predictions that have not yet been experimentally verified.

Regarding the concern about the difficulty of processing and validating large-scale data, it is indeed not a linear process. As the number of ncRNAs and drugs increases, the computational power and time required for processing and verification also grow significantly. Fortunately, research on ncRNA-drug associations is not yet as mature as that on drug–drug or drug–protein interactions, and the current data scale is still relatively small. In the future, as the data grow, we may consider using techniques such as knowledge distillation to reduce computational overhead.

Our work focuses specifically on predicting ncRNA–drug associations, we did not include ncRNA–disease or drug–disease interaction data in this study. However, there remains uncertainty about the therapeutic relevance of the predictions without the disease context, especially in distinguishing true targeted treatment effects from secondary or off-target associations. For example, BC200 lncRNA is overexpressed in colorectal cancer cells and is located adjacent to the oncogene EpCAM. BC200 RNA and EpCAM are involved in cell migration and invasion. A drug targeting BC200 might influence EpCAM activity, leading to effects that could be either therapeutic or off-target, depending on the context. [56] Integrating disease-level associations is crucial for understanding how ncRNA-drug interactions can translate into clinical outcomes.

However, bridging this gap requires establishing causal links between ncRNAs, drugs, and the biological processes of diseases, and confirming these relationships through large-scale validation. Incorporating additional information under conditions of insufficient disease-related data can increase the complexity of a multimodal model, potentially leading to adverse or counterproductive effects. And also, drug sensitivity and drug resistance prediction tasks also differ in how useful disease context is [57]. An ncRNA’s association with a disease may reflect many disease-related processes or biomarkers unrelated to drug action, so including that data can mislead the model away from true resistance pathways.

Based on the above reasons, we chose not to incorporate disease-level information into our current prediction model. However, if these issues can be addressed in the future, disease information could play a significant supporting role in drug screening and medical diagnosis.

Conclusion

In this study, a deep learning model called DMGAT using diffusion map was proposed to predict the possibility of ncRNA-drug associations. This model integrated heterogeneous convolution network with heterogeneous graph attention neural network to extract potential features of ncRNA and drug. Additionally, to tackle the imbalance and sparsity problem of dataset, the sensitivity between ncRNAs and drugs are introduced to select reliable negatives.

The performance of DMGAT was evaluated through 5-fold cross-validation on the dataset organized from NoncoRNA and ncDR. DMGAT performs better than the other seven state-of-the-art methods on metrics in terms of AUC, AUPR, recall, and F1-score. Ablation experiments show that all module is essential for achieving that good performance. Then, we conducted a case study on the top 20 associations predicted, which successfully identified differentially expressed ncRNAs from the literature, demonstrating the model’s capability to predict potential correlations and offering valuable insights for future experimental validation.

However, there are still some limitations in our approach. First, the use of word2vec and diffusion map for sequence embedding, while effective, can be computationally intensive for large datasets. As the amount of ncRNA and drug data continues to grow rapidly, this technique may face challenges when dealing with future massive datasets. Second, the attention mechanism in our graph attention network requires a large number of parameters. When dealing with extensive data, this could lead to significant memory usage and increased computational demands, potentially resulting in longer training and inference times. Additionally, our current model only considers the associations between ncRNAs and drugs. In the future, we aim to expand this model by incorporating other relevant information, such as ncRNA-disease associations and drug-disease interactions. This expansion could lead to the development of a more comprehensive model capable of simultaneously predicting the relationships among ncRNAs, drugs, and diseases, thereby providing a more holistic view of these complex biological interactions.

Key Points

  • We use word2vec and diffusion map to embed the non-coding RNA (ncRNA) sequence and drug simplified molecular input line entry system, which not only includes information of the entire sequence but also information of subsequences.

  • Sensitivity associations is applied to select reliable negative associations using a random forest classifier for addressing the problem of imbalanced dataset.

  • Attention mechanism of graph network is applied to integrate the feature of ncRNAs and drugs, which can integrate their information throughly.

  • The experimental results on benchmark datasets show that diffusion map and heterogeneous graph attention network (DMGAT) outperforms other seven state-of-the art approaches. Case study shows DMGAT has the potential to identify new ncRNA-drug resistance pairs.

Acknowledgments

We would like to thank the anonymous reviewers for their valuable suggestions.

Contributor Information

Tingyu Liu, School of Medicine and Heath, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, China.

Qiuhao Chen, Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.

Renjie Liu, Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.

Yuzhi Sun, School of Computer Science and Technology, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.

Yadong Wang, School of Computer Science and Technology, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.

Yan Zhu, College of Veterinary Medicine, Northeast Agricultural University, 150038, Xiangfang District, Changjiang Road No. 600, Harbin, China.

Tianyi Zhao, School of Medicine and Heath, Harbin Institute of Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, China; Zhengzhou Research Institute, Harbin Instituteof Technology, 150000, Nangang District, Xidazhi Street No. 90, Harbin, Heilongjiang, China.

 

Conflict of interest: None declared.

Funding

National key R&D plan(2022YFF1202101).

Data availability

The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.

References

  • 1. Wang  L, Xuan  Z, Zhou  S. et al.  A novel model for predicting lncRNA-disease associations based on the lncRNA-miRNA-disease interactive network. Curr Bioinform  2019;14:269–78. 10.2174/1574893613666180703105258 [DOI] [Google Scholar]
  • 2. Fan  S-B, Xie  X-F, Wei  W. et al.  Senescence-related lncRNAs: pioneering indicators for ovarian cancer outcomes. Phenomics  2024;4:379–93. 10.1007/s43657-024-00163-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ji  Q, Jiang  X, Wang  M. et al.  Multimodal omics approaches to aging and age-related diseases. Phenomics  2024;4:56–71. 10.1007/s43657-023-00125-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Treiber  T, Treiber  N, Meister  G. Regulation of microRNA biogenesis and its crosstalk with other cellular pathways. Nat Rev Mol Cell Biol  2019;20:5–20. 10.1038/s41580-018-0059-1 [DOI] [PubMed] [Google Scholar]
  • 5. Zhao  P, Chang  J, Chen  YK. et al.  Cellular senescence-related long non-coding RNA signatures predict prognosis in juvenile osteosarcoma. Phenomics  2024;4:430–52. 10.1007/s43657-023-00132-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Jeck  WR, Sorrentino  JA, Wang  K. et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA (New York, NY)  2013;19:141–57. 10.1261/rna.035667.112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Kristensen  LS, Hansen  TB, Venø  MT. et al.  Circular RNAs in cancer: opportunities and challenges in the field. Oncogene  2018;37:555–65. 10.1038/onc.2017.361 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kristensen  LS, Andersen  MS, Stagsted  LVW. et al.  The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet  2019;20:675–91. 10.1038/s41576-019-0158-7 [DOI] [PubMed] [Google Scholar]
  • 9. Lin  R, Zheng  S, Haiyu  S. et al.  Integrated transcriptome analysis of lncRNA, miRNA, and mRNA reveals key regulatory modules for polycystic ovary syndrome. Phenomics  2024;4:570–83. 10.1007/s43657-024-00183-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Seto  AG, Kingston  RE, Lau  NC. The coming of age for piwi proteins. Mol Cell  2007;26:603–9. 10.1016/j.molcel.2007.05.021 [DOI] [PubMed] [Google Scholar]
  • 11. Zhou  J, Zhou  W, Zhang  R. The potential mechanisms of piRNA to induce hepatocellular carcinoma in human. Med Hypotheses  2021;146:110400. 10.1016/j.mehy.2020.110400 [DOI] [PubMed] [Google Scholar]
  • 12. Xie  X-f, Xiao-qian  H, Liu  D-x. et al.  Identification of a novel pyroptosis-related lncRNAs prognosis model and subtypes in ovarian cancer. Phenomics  2025. 10.1007/s43657-024-00173-x [DOI] [Google Scholar]
  • 13. Witusik-Perkowska  M, Zakrzewska  M, Jaskolski  DJ  et al.  Artificial microenvironment of in vitro glioblastoma cell cultures changes profile of miRNAs related to tumor drug resistance. Onco Targets Ther  2019;12:3905–18. 10.2147/OTT.S190601 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Cui  T, Bell  EH, McElroy  J. et al.  A novel miR-146a-POU3F2/SMARCA5 pathway regulates stemness and therapeutic response in glioblastoma. Mol Cancer Res  2021;19:48–60. 10.1158/1541-7786.MCR-20-0353 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Lanlin  H, Liang  Y, Kelv  W. et al.  Repressing PDCD4 activates JNK/ABCG2 pathway to induce chemoresistance to fluorouracil in colorectal cancer cells. Ann Transl Med  2021;9:114. 10.21037/atm-20-4292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Dai  E, Yang  F, Jing Wang  X. et al.  ncDR: a comprehensive resource of non-coding RNAs involved in drug resistance. Bioinformatics (Oxford, England)  2017;33:4010–1. 10.1093/bioinformatics/btx523 [DOI] [PubMed] [Google Scholar]
  • 17. Li  Y, Wang  R, Zhang  S. et al.  Lrgcpnd: predicting associations between ncrna and drug resistance via linear residual graph convolution. Int J Mol Sci  2021;22:10508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Zhang  P, Wang  Z, Sun  W. et al.  RDRGSE: a framework for noncoding RNA-drug resistance discovery by incorporating graph skeleton extraction and attentional feature fusion. ACS Omega  2023;8:27386–97. 10.1021/acsomega.3c02763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Zheng  J, Qian  Y, He  J. et al.  Graph neural network with self-supervised learning for noncoding rna–drug resistance association prediction. J Chem Inf Model  2022;62:3676–84. 10.1021/acs.jcim.2c00367 [DOI] [PubMed] [Google Scholar]
  • 20. Li  L, Pengfei  W, Wang  Z. et al.  NoncoRNA: a database of experimentally supported non-coding RNAs and drug targets in cancer. J Hematol Oncol  2020;13:1–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Xinyu Cao  X, Zhou  FH, Huang  Y-e. et al.  ncRNADrug: a database for validated and predicted ncRNAs associated with drug resistance and targeted by drugs. Nucleic Acids Res  2024;52:D1393–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Le  NQK. Predicting emerging drug interactions using GNNs. Nat Comput Sci  2023;3:1007–8. 10.1038/s43588-023-00555-7 [DOI] [PubMed] [Google Scholar]
  • 23. Zhao  Z, Gui  J, Yao  A. et al.  Improved prediction model of protein and peptide toxicity by integrating channel attention into a convolutional neural network and gated recurrent units. ACS Omega  2022;7:40569–77. 10.1021/acsomega.2c05881 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Luo  J, Long  Y. NTSHMDA: prediction of human microbe-disease association based on random walk by integrating network topological similarity. IEEE/ACM Trans Comput Biol Bioinform  2018;17:1341–51. [DOI] [PubMed] [Google Scholar]
  • 25. Chen  X, Huang  Y-A, You  Z-H. et al.  A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics (Oxford, England)  2017;33:733–9. 10.1093/bioinformatics/btw715 [DOI] [PubMed] [Google Scholar]
  • 26. Zeng  M, Chengqian  L, Zhang  F. et al.  SDLDA: lncRNA-disease association prediction based on singular value decomposition and deep learning. Methods (San Diego, Calif)  2020;179:73–80. [DOI] [PubMed] [Google Scholar]
  • 27. Deepthi  K, Jereesh  AS. Inferring potential circRNA–disease associations via deep autoencoder-based classification. Mol Diagn Ther  2021;25:87–97. 10.1007/s40291-020-00499-y [DOI] [PubMed] [Google Scholar]
  • 28. Peng  L-H, Yin  J, Zhou  L. et al.  Human microbe-disease association prediction based on adaptive boosting. Front Microbiol  2018;9:2440. 10.3389/fmicb.2018.02440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu  Y, Wang  S-L, Zhang  J-F. et al.  Dmfmda: prediction of microbe-disease associations based on deep matrix factorization using bayesian personalized ranking. IEEE/ACM Trans Comput Biol Bioinform  2020;18:1763–72. [DOI] [PubMed] [Google Scholar]
  • 30. Liu  Y, Wang  S-L, Zhang  J-F. et al.  DMFMDA: prediction of microbe-disease associations based on deep matrix factorization using Bayesian personalized ranking. IEEE/ACM Trans Comput Biol Bioinform  2021;18:1763–72. 10.1109/TCBB.2020.3018138 [DOI] [PubMed] [Google Scholar]
  • 31. Weininger  D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci  1988;28:31–6. 10.1021/ci00057a005 [DOI] [Google Scholar]
  • 32. Mikolov  T. Efficient estimation of word representations in vector space.  arXiv preprint, arXiv:1301.3781. 2013.
  • 33. Řehůřek  R, Sojka  P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. Valletta, Malta: University of Malta, 2010, p. 46–50. ISBN 2-9517408-6-7.
  • 34. Li  Z, Liu  L, Feng  C. et al.  LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res  2023;51:D186–91. 10.1093/nar/gkac999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Kozomara  A, Birgaoanu  M, Griffiths-Jones  S. miRBase: from microRNA sequences to function. Nucleic Acids Res  2019;47:D155–62. 10.1093/nar/gky1141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Knox  C, Wilson  M, Klinger  CM. et al.  DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res  2024;52:D1265–75. 10.1093/nar/gkad976 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Van Laarhoven, Nabuurs  SB, Marchiori  E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics (Oxford, England)  2011;27:3036–43. 10.1093/bioinformatics/btr500 [DOI] [PubMed] [Google Scholar]
  • 38. Haghverdi  L, Buettner  F, Theis  FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics (Oxford, England)  2015;31:2989–98. 10.1093/bioinformatics/btv325 [DOI] [PubMed] [Google Scholar]
  • 39. Nestorowa  S, Hamey  FK, Sala  BP. et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood, J Am Soc Hematol  2016;128:e20–31. 10.1182/blood-2016-05-716480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Zhang  H, Ming  Z, Fan  C. et al.  A path-based computational model for long non-coding RNA-protein interaction prediction. Genomics  2020;112:1754–60. 10.1016/j.ygeno.2019.09.018 [DOI] [PubMed] [Google Scholar]
  • 41. Liu  L, Zhou  Y, Lei  X. RMDGCN: prediction of RNA methylation and disease associations based on graph convolutional network with attention mechanism. PLoS Comput Biol  2023;19:e1011677. 10.1371/journal.pcbi.1011677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Zhang  J, Hu  X, Jiang  Z. et al.  Predicting disease-related RNA associations based on graph convolutional attention network. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 177–82. IEEE. [Google Scholar]
  • 43. Mudiyanselage  TB, Lei  X, Senanayake  N. et al.  Predicting CircRNA disease associations using novel node classification and link prediction models on graph convolutional networks. Methods (San Diego, Calif)  2022;198:32–44. 10.1016/j.ymeth.2021.10.008 [DOI] [PubMed] [Google Scholar]
  • 44. Honglin  S, Gao  H. PDA-GCN: predicting piwi-interacting RNA-disease associations based on graph convolution network. In: 2023 11th International Conference on Bioinformatics and Computational Biology (ICBCB), pp. 118–22. IEEE. [Google Scholar]
  • 45. Liu  Y, Zhang  F, Ding  Y. et al.  MRDPDA: a multi-Laplacian regularized deepFM model for predicting piRNA-disease associations. J Cell Mol Med  2024;28:e70046. 10.1111/jcmm.70046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Defferrard  M, Bresson  X, Vandergheynst  P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inform Process Syst  2016;29. [Google Scholar]
  • 47. Bian  C, Lei  X-J, Fang-Xiang  W. GATCDA: predicting circRNA-disease associations based on graph attention network. Cancer  2021;13:2595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Cao  R, He  C, Wei  P. et al.  Prediction of circRNA-disease associations based on the combination of multi-head graph attention network and graph convolutional network. Biomolecules  2022;12:932. 10.3390/biom12070932 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Veličković  P, Cucurull  G, Casanova  A. et al.  Graph attention networks.  arXiv preprint, arXiv:1710.10903. 2017.
  • 50. Liu  Y, Min  W, Miao  C. et al.  Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput Biol  2016;12:e1004760. 10.1371/journal.pcbi.1004760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Niu  Y, Song  C, Gong  Y. et al.  MiRNA-drug resistance association prediction through the attentive multimodal graph convolutional network. Front Pharmacol  2022;12:799108. 10.3389/fphar.2021.799108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Wang  Y, Yang  L, Chang  Y. MiR-26b enhances drug resistance of pancreatic cancer cells to gemcitabine by inhibiting p 53 gene expression. Int J Lab Med  2017;3073–6. [Google Scholar]
  • 53. Wang  C-Q. MiR-195 reverses 5-FU resistance through targeting HMGA1 in gastric cancer cells. Eur Rev Med Pharmacol Sci  2019;23. [DOI] [PubMed] [Google Scholar]
  • 54. Li  Y, VandenBoom  TG, Kong  D. et al.  Up-regulation of miR-200 and let-7 by natural agents leads to the reversal of epithelial-to-mesenchymal transition in gemcitabine-resistant pancreatic cancer cells. Cancer Res  2009;69:6704–12. 10.1158/0008-5472.CAN-09-1298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Li  J, Chen  Y, Jin  M. et al.  MicroRNA-134 reverses multidrug resistance in human lung adenocarcinoma cells by targeting FOXM1. Oncol Lett  2017;13:1451–5. 10.3892/ol.2017.5574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Cao  X-G, Zhao  R, Zhu  C. et al.  BC200 LncRNA a potential predictive marker of poor prognosis in esophageal squamous cell carcinoma patients. Onco Targets Ther  2016;2221. 10.2147/OTT.S99401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Mégret  L, Mendoza  C, Lobo  MA. et al.  Precision machine learning to understand micro-RNA regulation in neurodegenerative diseases. Front Mol Neurosci  2022;15:914830. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The raw data are released to Zenodo with identifier 13929676. The source code of DMGAT is available at https://github.com/liutingyu0616/DMGAT/tree/main.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES