Skip to main content
Biology Methods & Protocols logoLink to Biology Methods & Protocols
. 2025 Sep 2;10(1):bpaf065. doi: 10.1093/biomethods/bpaf065

Integrating multiple microRNA functional similarity networks for improved disease–microRNA association prediction

Duc-Hau Le 1,
PMCID: PMC12410926  PMID: 40919056

Abstract

MicroRNAs (miRNAs) play a critical role in disease mechanisms, making the identification of disease-associated miRNAs essential for precision medicine. We propose a novel computational method, multiplex-heterogeneous network for MiRNA-disease associations (MHMDA), which integrates multiple miRNA functional similarity networks and a disease similarity network into a multiplex-heterogeneous network. This approach employs a tailored random walk with restart algorithm to predict disease-miRNA associations, leveraging the complementary information from experimentally validated and predicted miRNA-target interactions, as well as disease phenotypic similarities. Evaluated on the human microRNA disease database and miR2Disease datasets using leave-one-out cross-validation and 5-fold cross-validation, MHMDA demonstrates superior performance, achieving area under the receiver operating characteristic curve values of 0.938 and 0.913 on human microRNA disease database and miR2Disease, respectively, and outperforming existing methods. The integration of multiplex networks enhances prediction accuracy by capturing diverse miRNA functional relationships, which directly contributes to the high area under the receiver operating characteristic curve and area under the precision-recall curve values observed. Additionally, MHMDA’s stability across parameter variations and disease contexts underscores its robustness and potential for real-world applications in identifying novel disease-miRNA associations.

Keywords: miRNA-disease association, multiplex-heterogeneous networks, random walk with restart algorithm, miRNA functional similarity, MHMDA method

Introduction

MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression at the posttranscriptional level [1, 2], playing pivotal roles in the molecular mechanisms of both common [3–7] and rare diseases [8]. Dysregulation of miRNAs has been implicated in a wide range of conditions, including cancer, metabolic disorders, and neurodegenerative diseases, making them promising biomarkers and therapeutic targets [9]. For example, hsa-miR-21 are overexpressed in breast cancer, influencing tumor progression, while hsa-miR-146a is linked to immune responses in Alzheimer’s disease [8, 10]. Identifying disease-associated miRNAs is thus critical for understanding disease mechanisms and advancing precision medicine. However, experimental validation of miRNA-disease associations is time-consuming and costly, driving the need for computational methods to predict novel associations efficiently.

Numerous computational approaches have been developed to predict disease-miRNA associations, which can be broadly categorized into machine learning-based and network-based methods [11, 12]. Machine learning-based methods leverage powerful algorithms to extract features from biological data and make predictions. Traditional machine learning models, such as support vector machines [13], k-nearest neighbor [14], decision trees [15, 16], regularized least squares [17], matrix completion [18, 19] and matrix factorization [20], have been widely applied to this problem. For example, Xu et al. (2011) used support vector machines to prioritize candidate miRNAs for prostate cancer by integrating miRNA expression profiles and target gene dysregulation [13]. More recently, deep learning models have gained traction due to their ability to automatically extract complex features from high-dimensional data, such as miRNA sequences and expression profiles. These include deep belief networks [21], convolutional neural networks [22], autoencoders [23], graph convolutional networks [24, 25], and knowledge graph representation learning models [26]. For instance, Yu et al. (2022) proposed a knowledge-driven network model that constructs a fine-grained miRNA-disease knowledge graph by integrating multi-omics data, achieving improved prediction performance by capturing semantic relationships between miRNAs and diseases [26]. Deep learning models offer significant advantages, such as scalability and the ability to handle sparse data through techniques like autoencoders, which can generate synthetic data to address data scarcity [23]. Graph convolutional networks, in particular, excel at capturing structural relationships in miRNA-disease interaction networks [24, 25]. However, these models often require large, high-quality datasets, which are limited in miRNA studies due to incomplete or noisy data. Additionally, their computational complexity demands significant resources and expertise, potentially leading to overfitting or poor generalization on sparse datasets [12]. Given these challenges, network-based methods emerge as a complementary approach, leveraging biological interaction networks to incorporate prior knowledge and improve interpretability, particularly in data-scarce environments.

Network-based methods leverage biological interaction networks to provide prior knowledge, reduce data dependency, and enhance interpretability by integrating relational information that deep learning models might overlook. These methods often rely on the “disease module” principle, which posits that functionally related miRNAs are associated with phenotypically similar diseases [27, 28]. Network-based approaches can be further divided into local and global similarity measure-based methods [29]. Local similarity methods focus on direct neighbors of known disease-associated miRNAs within miRNA functional similarity networks (also called miRNA monoplex networks, where nodes are miRNAs and edges represent functional relatedness based on shared target genes), whereas global similarity methods, such as random walk with restart (RWR), utilize the entire network topology to capture broader interactions. Local similarity methods, such as those proposed by [30–32], assess direct neighbors of known disease-associated miRNAs in miRNA functional similarity networks. For example, Wang et al. (2010) developed a method to infer miRNA functional similarity based on disease associations, which was then used to predict novel disease-miRNA associations [32]. However, local similarity methods are limited by their inability to capture broader network structures. Global similarity measure-based methods, such as RWR, address this limitation by considering the entire network topology [33–35]. Random walk with restart for miRNA–disease association (RWRMDA) [33], for instance, applies RWR on a miRNA monoplex network to rank candidate miRNAs for a given disease, outperforming local similarity methods by capturing long-range interactions. Despite its success, RWRMDA overlooks disease similarity, limiting its ability to fully exploit the disease module principle. To address this, random walk with restart for human microRNA-disease association (RWRHMDA) [36] integrates a disease similarity network with a miRNA monoplex network, forming a heterogeneous network, and uses a variant of RWR to predict associations, demonstrating improved performance over RWRMDA.

Despite these advances, current network-based methods face a key limitation: they rely on a single miRNA monoplex network, often constructed from a single miRNA-target database. The quality of these databases varies significantly, as they include both experimentally validated (e.g. miRWalk [37], miRTarBase [38]) and predicted interactions (e.g. TargetScan [39–42], miRDB [43]), with differences even among validated sources [44, 45]. This variability impacts the reliability of miRNA functional similarity networks, suggesting that integrating multiple networks could enhance prediction accuracy.

To overcome these limitations—namely, the data dependency of single-network approaches and the narrow scope of existing heterogeneous methods—we introduce multiplex-heterogeneous network for MiRNA-disease associations (MHMDA). MHMDA’s 3 key innovations directly address these gaps: (i) integrating multiple miRNA functional similarity networks (e.g. from miRWalk and TargetScan) to mitigate data scarcity and enhance functional coverage, countering the single-source limitation of RWRMDA; (ii) constructing a multiplex-heterogeneous network that combines miRNA multiplex and disease similarity layers, overcoming the restricted diversity in RWRHMDA; and (iii) employing a tailored RWR algorithm to optimize predictions across this complex structure, addressing the suboptimal performance of generic RWR on heterogeneous data. This approach enables MHMDA to capture a wider range of miRNA-disease associations. We evaluate MHMDA using leave-one-out cross-validation (LOOCV) on known associations from human microRNA disease database (HMDD) and miR2Disease datasets, demonstrating that it outperforms existing methods like RWRMDA and RWRHMDA. A comparison with the state-of-the-art method multi-channel attentive multi-view fused graph attention network (MAMFGAT) [46] revealed comparable area under the receiver operating characteristic curve (AUROC) performance on HMDD, and enhanced performance on miR2Disease in AUROC using 5-fold cross-validation. Furthermore, we assessed MHMDA’s robustness across diverse disease characteristics, finding negligible correlation between prediction performance (AUROC) and data imbalance, and minimal variation across disease categories (e.g. cancer, cardiovascular) and tissues (e.g. blood, brain). Furthermore, we apply MHMDA to predict novel miRNAs for diseases such as lung and prostate cancer, with top predictions validated by public databases and literature. This study provides a more comprehensive and accurate approach to disease-miRNA association prediction, with potential to accelerate the discovery of new biomarkers and therapeutic targets.

Materials and methods

This section describes the methodology for predicting disease-miRNA associations using the MHMDA method. We first construct various networks of diseases and miRNAs, including monoplex, multiplex, heterogeneous, and multiplex-heterogeneous networks. Then, we apply a tailored RWR algorithm to rank candidate miRNAs for a given disease. Finally, we evaluate the prediction performance using LOOCV on known disease-miRNA associations.

Construction of networks of diseases and miRNAs

To predict disease-miRNA associations, we constructed a series of networks capturing functional relationships between miRNAs and phenotypic similarities between diseases. These networks include miRNA monoplex networks, a miRNA multiplex network, and heterogeneous and multiplex-heterogeneous networks integrating disease and miRNA information. Figure 1 illustrates the network construction process.

Figure 1.

Figure 1

Construction of disease and miRNA Networks for MHMDA. This figure illustrates the construction of various networks used in MHMDA, with nodes representing miRNAs or diseases and edges representing interactions or associations. (a) miRNA monoplex network (MonoNet_miRWalk) constructed from the experimentally validated miRNA-target database miRWalk. (b) miRNA monoplex network (MonoNet_TargetScan) constructed from the predicted miRNA-target database TargetScan. (c) Integrated miRNA monoplex network (MonoNet_Integrated) constructed by combining MonoNet_miRWalk and MonoNet_TargetScan. (d) Heterogeneous network (DiSimNet-MonoNet_miRWalk or DiSimNet-MonoNet_TargetScan) formed by connecting a miRNA monoplex network, the disease similarity network, and known disease-miRNA associations. (e) miRNA multiplex network (MultiNet_miRNA) composed of the two miRNA monoplex networks (MonoNet_miRWalk and MonoNet_TargetScan). (f) Multiplex-heterogeneous network (DiSimNet-MultiNet_miRNA) formed by connecting the disease similarity network and the multiplex network (MultiNet_miRNA) using known disease-miRNA associations

Construction of miRNA monoplex networks

For the development of miRNA monoplex networks, we utilized miRWalk [47] and TargetScan [48] to construct MonoNet_miRWalk and MonoNet_TargetScan, respectively. miRWalk provided 38 571 experimentally validated human miRNA-target interactions (involving 745 miRNAs and 11 976 genes), filtered for reliability by requiring at least 2 supporting experimental validations, while TargetScan contributed 520 526 predicted interactions (spanning 1547 miRNAs and 15 021 genes) based on non-conserved site context++ scores with a cumulative weighted context++ score better than −0.3. Following a methodology akin to prior work [29, 36], we established functional relationships between miRNAs by linking those that co-target at least one gene, with similarity quantified as the count of common target genes divided by the lesser number of targets associated with either miRNA. This resulted in MonoNet_miRWalk with 730 miRNAs and 29 089 interactions, and MonoNet_TargetScan with 1428 miRNAs and 46 118 interactions, capturing both validated and potential miRNA-target relationships.

In each monoplex network, nodes represent miRNAs, and edges represent functional similarity based on shared target genes. Two miRNAs were considered functionally related if they shared at least one target gene, reflecting the biological principle that miRNAs with common targets often regulate similar pathways [27]. The similarity between miRNAs i and j was quantified as the number of common target genes divided by the lesser number of targets associated with either miRNA:

wij=|TiTj|min(|Ti|,|Tj|)

where Ti and Tj are the sets of target genes for miRNAs i and j, respectively. This yielded two networks: MonoNet_miRWalk (Fig. 1a) and MonoNet_TargetScan (Fig. 1b).

To assess the benefit of integrating multiple data sources, we combined the two monoplex networks into an integrated network, MonoNet_Integrated (Fig. 1c), using a per-edge average method. The biological principle underpinning this integration is that miRNAs sharing common target genes are likely to regulate similar biological pathways, reflecting functional relatedness based on their co-regulatory roles in disease mechanisms. This principle is quantified by calculating the similarity between miRNAs based on their shared targets. For each pair of miRNAs i and j, the integrated edge weight was calculated as:

w¯ij=1Mk=1M(wij)k

where M is the number of networks containing an interaction between miRNAs i and j, and (wij)k is the edge weight in network k. We chose the per-edge average method to balance the contributions of each network while preserving the relative strengths of interactions. The resulting MonoNet_Integrated network consists of 1814 miRNAs and 74 368 interactions. Table 1 summarizes the statistics of the 3 monoplex networks, which were used to evaluate the RWRMDA method [33].

Table 1.

Summary of miRNA monoplex networks used in MHMDA.

miRNA monoplex network Number of miRNAs Number of interactions
MonoNet_miRWalk 730 29 089
MonoNet_TargetScan 1428 46 118
MonoNet_Integrated 1814 74 368

Construction of miRNA multiplex network

To leverage the complementary information from MonoNet_miRWalk and MonoNet_TargetScan, we constructed a miRNA multiplex network, MultiNet_miRNA, with two layers. Each layer corresponds to one of the monoplex networks, and the same miRNAs across layers are connected with inter-layer edges (Fig. 1e). This structure allows the RWR algorithm to explore functional similarities within and across layers, capturing a more comprehensive view of miRNA relationships. MultiNet_miRNA was used to evaluate the MHMDA-M method.

Database of known disease-miRNA associations

We obtained known disease-miRNA associations from two manually curated databases: miR2Disease [49] and HMDD (Human MicroRNA Disease Database) [50]. From miR2Disease, we extracted 270 experimentally validated associations between 53 diseases and 118 miRNAs. From HMDD, we collected 5618 associations between 243 diseases and 574 miRNAs. These associations were used to form a bipartite network linking diseases and miRNAs, which serves as the ground truth for training and evaluation. Table 2 summarizes the statistics of these datasets.

Table 2.

Summary of databases of known disease–miRNA associations.

Datasets Number of miRNAs Number of diseases Number of known associations
miR2Disease 118 53 270
HMDD 574 243 5618

Construction of heterogeneous and multiplex-heterogeneous networks

To incorporate disease similarity, we constructed a disease phenotype similarity network using a similarity matrix from [51], where each element represents the phenotypic similarity between two diseases based on text-mining of OMIM descriptions. We selected the top 5 neighbors with the highest similarity scores for each disease to form a sparse network, as this threshold balances connectivity and computational efficiency while avoiding noise from low-similarity connections. This resulted in a disease similarity network with 5080 diseases and 19 791 interactions.

Heterogeneous networks were constructed by connecting the disease similarity network to each miRNA monoplex network (MonoNet_miRWalk and MonoNet_TargetScan) using the known disease-miRNA associations, yielding DiSimNet-MonoNet_miRWalk and DiSimNet-MonoNet_TargetScan (Fig.1d). Similarly, a multiplex-heterogeneous network, DiSimNet-MultiNet_miRNA (Fig. 1f), was formed by connecting the disease similarity network to the miRNA multiplex network (MultiNet_miRNA). After integration, only diseases with known miRNA associations were retained, resulting in 53 diseases from miR2Disease and 171 from HMDD for analysis. These networks were used to evaluate the RWRHMDA [36] and MHMDA-MH methods, respectively.

RWR algorithm on networks of diseases and miRNAs

We applied an RWR algorithm to rank candidate miRNAs for a given disease based on their proximity to known disease-associated miRNAs and the disease itself in the network. The RWR algorithm simulates a walker that starts at a set of seed nodes and either moves to a neighboring node or restarts at the seed nodes with probability γ. Below, we describe the RWR implementation for monoplex/multiplex and heterogeneous/multiplex-heterogeneous networks.

RWR algorithm on monoplex and multiplex networks of miRNAs

For a miRNA monoplex network, represented as an n × n adjacency matrix AR, where each entry (AR)i,j indicates the similarity between miRNAs ri and rj, the RWR process is expressed as:

PRt+1=(1-γ)MRPRt+γPR0

where γ [0,1] denotes the restart probability,PRt is an n × 1 vector of probabilities at time step t, and MR is the column-normalized transition matrix derived from AR. The initial probability vector PR0 is set to 1|S| for seed nodes (miRNAs known to be associated with the disease) and 0 otherwise, ensuring equal contribution from each seed node:

PR0={1|S| if riS0 otherwise

The walker converges to a steady-state vector PR, which ranks miRNAs based on their proximity to the seed nodes. This approach was used in RWRMDA [33].

For the miRNA multiplex network (MultiNet_miRNA), we extended the RWR algorithm to account for multiple layers [52]. The adjacency matrix is represented as follows:

ARM=[(1-δ)AR[1]δ(L1)Iδ(L1)I(1-δ)AR[L]]

where L is a number of layers and AR[i]is an adjacency matrix of a miRNA monoplex network at layer i (i = 1,…, L). I is an identity matrix. A parameter δ∈[0,1] is introduced accounting for the jumping probability between miRNA network layers (short call between-miRNA-miRNA-network jumping probability). The RWR equation for a miRNA multiplex network becomes:

PRMt+1=(1-γ)MRMPRMt+γPRM0

where PRMt=[PR[1]t,,PR[L]t] and PRMt+1=[PR[1]t+1,,PR[L]t+1] are n × L matrices representing probabilities at time t and t + 1, and MRM is the column-normalized transition matrix of ARM. The initial probability PRM0=τPR0, where τ=[τ1,,τL] weights the contribution of each layer. We set τ1=τ2=τL=1/L to assume equal contribution from each layer, as we lacked prior knowledge to prioritize one layer over the other. The final score for each miRNA was computed as the geometric mean of the steady-state probabilities across layers PRM, ensuring a balanced integration of information. We named this approach as MHMDA-M.

RWR algorithm on heterogeneous and multiplex-heterogeneous networks

For heterogeneous networks (e.g. DiSimNet-MonoNet_miRWalk), we used the RWRH (RWR algorithm on a heterogeneous network) algorithm [53], which extends RWR to rank both diseases and miRNAs simultaneously. The adjacency matrix is:

AH=[ADBBTAR]

where AD is the disease similarity matrix, AR is the miRNA monoplex matrix, and B is the bipartite matrix of known disease-miRNA associations. The transition matrix MH is:

MH=[MDHMDRHMRDHMRH]

where MDH and MRH are intra-subnetwork transition matrices (column-normalized AD and AR), and MDRH and MRDH are inter-subnetwork transition matrices. The probability of transitioning from the disease to the miRNA network (or vice versa) is controlled by a jumping probability λ ∈ [0,1]. Specifically, MDRH=λB and MRDH=λBT, with the remaining probabilities (1-λ) staying within the same subnetwork. The RWR equation is:

Pt+1H=(1-γ)MHPtH+γP0H

where PtHand Pt+1H are of dimension (n + m) ×1 (with n miRNAs and m diseases), and the initial probability vector is:

P0H=[(1-η)PR0ηPD0]

Here, PD0 assigns a probability of 1 to the disease of interest and 0 to others, and η ∈ [0,1] balances the importance of the disease and miRNA networks. This approach was used in RWRHMDA [36].

For the multiplex-heterogeneous network (DiSimNet-MultiNet_miRNA), we connected the disease similarity network to each layer of the miRNA multiplex network using L identical bipartite matrices B[k]. The adjacency matrix is:

AMH=[ADBMH(BMH)TARM],  BMH=[B[1]B[L]]

Following the same equations as for the heterogeneous network, we can calculate the transition matrix of the multiplex-heterogeneous network

MMH=[MDMHMDRMHMRDMHMRMH]

Therefore, the RWR equation for the multiplex-heterogeneous network becomes [52]:

Pt+1MH=(1-γ)MMHPtMH+γP0MH

where Pt+1MH, PtMH, and P0MH are of dimension (n × L + m), and P0MH=[(1-η)PRM0ηPD0]

The final scores for miRNAs were extracted from the steady-state probabilities corresponding to the miRNA nodes. We named this approach as MHMDA-H.

Performance evaluation

We evaluated the prediction performance of MHMDA using LOOCV on the known disease-miRNA associations from miR2Disease and HMDD. LOOCV was chosen because it maximizes the use of limited known associations, providing a robust estimate of performance despite the small number of positives. For each disease d, we considered only diseases with at least two known associated miRNAs to ensure sufficient seed nodes for LOOCV. The set of known associated miRNAs (D) and candidate miRNAs (C, all miRNAs not associated with d) were defined. In each iteration, one miRNA s ∈ D was held out as the test case, and the remaining miRNAs in D\{s} were used as seed nodes. The RWR algorithm ranked all miRNAs in C∪{s}.

We computed true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) based on a rank threshold τ:  

TP=sSI(rank(s)τ) FN=sSI(rank(s)>τ)

 

FP=cCI(rank(c)τ) TN=cCI(rank(c)>τ)

where rank(s) and rank(c) are the ranks of the held-out miRNA s and candidate miRNA c, respectively, and I(·) is the indicator function. We then calculated the true positive rate (TPR), false positive rate (FPR), precision, and recall:

FPR=FPFP+TNTPR=TPTP+FN

 

Precision=TPTP+FPRecall=TPTP+FN

By varying τ from 1 to the number of miRNAs in C∪{s}, we generated receiver operating characteristic and precision-recall curves, computing the area under the curves [AUROC and area under the precision-recall curve (AUPRC)] as performance metrics. AUROC measures overall ranking performance, while AUPRC is more sensitive to the imbalanced nature of the dataset, where known miRNAs are a small fraction of all miRNAs [54] [e.g. an average of 2.3 miRNAs are known to be associated with a disease in miR2Disease and HMDD, representing about 0.215% (ranging from 0.13% to 0.3%) of the remaining miRNAs, depending on the miRNA monoplex networks]. We compared MHMDA (MHMDA-M and MHMDA-MH) against RWRMDA and RWRHMDA to assess the benefit of integrating multiple miRNA networks and disease similarity information.

Results

We evaluated the performance of MHMDA in predicting disease-miRNA associations using two datasets: miR2Disease and HMDD, with the aim of demonstrating its effectiveness across both rather than comparing the datasets. MHMDA was implemented in two configurations: MHMDA-M, which uses a multiplex miRNA network (MultiNet_miRNA), and MHMDA-MH, which uses a multiplex-heterogeneous network (DiSimNet-MultiNet_miRNA). These were compared against RWRMDA (monoplex miRNA network) and RWRHMDA (heterogeneous network) to assess the benefit of integrating multiple miRNA networks and disease similarity information. We also applied MHMDA-MH to predict novel disease-miRNA associations, validating the predictions using public databases and literature.

Parameter settings

The RWR algorithm used in MHMDA depends on several parameters that influence its prediction performance on different network types. Prior studies on RWR-based methods for predicting disease-associated biomarkers, such as disease genes, have extensively analyzed parameters like the restart probability (γ), the jumping probability between disease and miRNA networks (λ), and the importance weight of disease vs. miRNA seeds (η) in heterogeneous networks, demonstrating their stability across a range of values [52, 53, 55]. Since the primary goal of this study is to demonstrate that integrating multiple miRNA functional similarity networks improves the prediction of disease-miRNA associations, we focused on two key parameters: the restart probability (γ) and the jumping probability between miRNA layers in the multiplex network (δ).

The restart probability γ ∈ [0,1] controls the likelihood that the random walker returns to the seed nodes (known disease-associated miRNAs) at each step, balancing the exploration of local neighborhoods with global network structure. The jumping probability δ ∈ [0,1] determines the likelihood of the walker transitioning between layers of the miRNA multiplex network (MultiNet_miRNA), reflecting the complementary nature of experimentally validated (miRWalk) and predicted (TargetScan) miRNA-target interactions. For example, a higher δ allows the walker to leverage functional similarities captured in both layers, potentially identifying miRNAs that regulate shared pathways across different data sources. We evaluated two network configurations: the miRNA multiplex network (MultiNet_miRNA, corresponding to MHMDA-M) and the multiplex-heterogeneous network (DiSimNet-MultiNet_miRNA, corresponding to MHMDA-MH). In the multiplex network, each layer represents a miRNA monoplex network, and the importance weights of the layers (τ1,,τL) were set to τ1=τ2=τL=1/L (where L = 2), assuming equal contribution from each layer due to the lack of prior knowledge about their relative biological importance.

To investigate the effect of γ, we varied it in the range [0.1, 0.3, 0.5, 0.7, 0.9] while keeping other parameters constant (δ = λ = η = 0.5). We assessed the prediction performance using the AUROC on the HMDD and miR2Disease datasets. Figure 2a and c shows the AUROC as a function of γ for HMDD and miR2Disease, respectively. For the multiplex network (MHMDA-M), the AUROC remained nearly unchanged across the range of γ values, varying by less than 0.24% (e.g. from 0.825 to 0.827 for HMDD). For the multiplex-heterogeneous network (MHMDA-MH), the AUROC decreased slightly as γ increased, dropping by approximately 2.6% (e.g. from 0.944 at γ = 0.1 to 0.920 at γ = 0.9 for HMDD), suggesting that a higher restart probability may overly constrain the walker to seed nodes, limiting its ability to explore disease similarity information.

Figure 2.

Figure 2

Prediction performance of MHMDA-M and MHMDA-MH across parameter settings. This figure shows the prediction performance (AUROC) of MHMDA-M (miRNA multiplex network) and MHMDA-MH (multiplex-heterogeneous network) as a function of parameter changes, with line plots where the x-axis represents the parameter value and the y-axis represents AUROC. (a) Performance on the HMDD dataset with varying restart probability (γ). (b) Performance on the HMDD dataset with varying jumping probability (δ) between miRNA networks. (c) Performance on the miR2Disease dataset with varying restart probability (γ). (d) Performance on the miR2Disease dataset with varying jumping probability (δ) between miRNA networks. (e) Performance on the HMDD dataset with varying jumping probability between disease and miRNA networks (λ). (f) Performance on the HMDD dataset with varying importance weight of disease vs. miRNA seeds (η). (g) Performance on the miR2Disease dataset with varying jumping probability between disease and miRNA networks (λ). (h) Performance on the miR2Disease dataset with varying importance weight of disease vs. miRNA seeds (η)

Figure 2.

Figure 2

(Continued)

Next, we varied δ in the range [0.1, 0.3, 0.5, 0.7, 0.9] while keeping other parameters constant (γ = λ = η = 0.5). Figure 2b and d shows the AUROC as a function of δ for HMDD and miR2Disease, respectively. For the multiplex-heterogeneous network (MHMDA-MH), the AUROC remained stable, varying by less than 0.64% across the range (e.g. from 0.937 to 0.943 for HMDD). For the multiplex network (MHMDA-M), the AUROC decreased slightly as δ increased, dropping by about 1.7% (e.g. from 0.833 at δ = 0.1 to 0.819 at δ = 0.9 for HMDD), indicating that excessive jumping between layers may dilute the walker’s focus on within-layer functional similarities.

As in prior studies [52, 53, 55], we additionally analyzed the jumping probability between disease and miRNA networks (λ) and the importance weight of disease versus miRNA seeds (η) in multiplex-heterogeneous networks. We varied either λ or η in the range [0.1, 0.3, 0.5, 0.7, 0.9] while keeping other parameters at 0.5. Figure 2e–h also demonstrates their stability across the range of values, with variations in AUROC typically less than 2.6%.

Based on these results, the prediction performance of both MHMDA-M and MHMDA-MH is relatively stable with respect to changes in all the parameters. Figure 2a–d also shows that MHMDA-MH consistently outperforms MHMDA-M across all parameter settings, highlighting the benefit of incorporating disease similarity information, as will be discussed in detail in the next subsection. Having established the stability of MHMDA-M and MHMDA-MH across various parameter settings, we set γ = δ = λ =η = 0.5 for all subsequent experiments. These values balance the exploration of local and global network structures (γ) and the integration of complementary miRNA layers (δ), while maintaining equal weighting for disease and miRNA networks (λ, η), providing a neutral baseline for the random walker’s behavior.

Integration of multiple miRNA networks improves the prediction performance

We evaluated the prediction performance of MHMDA using LOOCV on the miR2Disease and HMDD datasets. Performance was measured using the AUROC and the AUPRC. All experiments used the parameter settings determined in the previous section (γ = δ = λ =η = 0.5).

miRNA multiplex and monoplex networks

To demonstrate the efficacy of integrating multiple miRNA functional similarity networks, we compared the prediction performance of MHMDA-M, which uses the multiplex miRNA network (MultiNet_miRNA), against RWRMDA on three monoplex networks: MonoNet_miRWalk (constructed from miRWalk data), MonoNet_TargetScan (constructed from TargetScan data), and MonoNet_Integrated (an integrated network combining miRWalk and TargetScan data). The analysis was performed on two distinct databases of known disease-miRNA associations, HMDD and miR2Disease, to ensure the reliability and generalizability of the findings.

First, we compared the methods using AUROC. Figure 3a and b shows the ROC curves for HMDD and miR2Disease, respectively, with the x-axis representing the FPR and the y-axis representing the TPR. The miRNA multiplex network (MultiNet_miRNA) consistently demonstrated superior performance across both databases, achieving the highest AUROC values: 0.827 for HMDD and 0.900 for miR2Disease. For the HMDD dataset (Fig. 3a), the mean AUROC of MHMDA-M (0.827) was greater than that of RWRMDA (MonoNet_miRWalk) (0.690), RWRMDA (MonoNet_TargetScan) (0.727), and RWRMDA (MonoNet_Integrated) (0.801), corresponding to improvements of 19.9%, 13.8%, and 3.2%, respectively (Welch’s two-sample one-tailed t-test: all P values ≤6.27 × 10−6). Similarly, for the miR2Disease dataset (Fig. 3b), the mean AUROC of MHMDA-M (0.900) was greater than that of RWRMDA (MonoNet_miRWalk) (0.778), RWRMDA (MonoNet_TargetScan) (0.799), and RWRMDA (MonoNet_Integrated) (0.869), with improvements of 15.7%, 12.6%, and 3.6%, respectively (Welch’s two-sample one-tailed t-test: all P values ≤ 0.0374).

Figure 3.

Figure 3

Performance comparison of MHMDA-M and RWRMDA on miRNA multiplex and monoplex networks. This figure compares the performance of MHMDA-M (MultiNet_miRNA) and RWRMDA (MonoNet_miRWalk, MonoNet_TargetScan, MonoNet_Integrated) using AUROC and AUPRC, with ROC curves (panels a and b) and precision-recall curves (panels c and d). (a) ROC curves for the HMDD dataset, with the x-axis as FPR and the y-axis as TPR. (b) ROC curves for the miR2Disease dataset. (c) Precision-recall curves for the HMDD dataset, with the x-axis as recall TPR and the y-axis as precision. (d) Precision-recall curves for the miR2Disease dataset. Color scheme: MHMDA-M in red, RWRMDA (MonoNet_miRWalk) in blue, RWRMDA (MonoNet_TargetScan) in purple, and RWRMDA (MonoNet_Integrated) in green

Among the monoplex networks, RWRMDA (MonoNet_Integrated) consistently ranked second in performance, with AUROC values of 0.801 for HMDD and 0.869 for miR2Disease. It outperformed both RWRMDA (MonoNet_miRWalk) and RWRMDA (MonoNet_TargetScan) (Welch’s two-sample one-tailed t-test: all P values ≤ 4.6 × 10−4), underscoring the value of integrating multiple data sources even within a monoplex framework. The progression from individual monoplex networks (MonoNet_miRWalk, MonoNet_TargetScan) to the integrated monoplex network (MonoNet_Integrated) and finally to the multiplex network (MultiNet_miRNA) shows a clear trend of incremental improvement, demonstrating the cumulative benefits of data integration and network complexity in capturing disease-miRNA relationships.

We also compared the methods using AUPRC, which is more sensitive to performance on the minority class (known associations) in imbalanced datasets. Figure 3c and d shows the precision-recall curves for HMDD and miR2Disease, respectively, with the x-axis representing recall and the y-axis representing precision. MHMDA-M achieved the highest AUPRC values: 0.007 for HMDD and 0.028 for miR2Disease, followed by RWRMDA (MonoNet_Integrated) with AUPRC values of 0.004 and 0.016, respectively.

Given the extreme data imbalance (i.e. known miRNAs are about 0.215% of the remaining miRNAs, or one positive for every 465 negatives), the AUPRC values are low in absolute terms. The baseline AUPRC for a random classifier in such an imbalanced dataset is approximately equal to the proportion of positive instances (i.e. 0.00215). However, MHMDA-M’s AUPRC of 0.028 for miR2Disease is about 13 times higher than the baseline, and its AUPRC of 0.007 for HMDD is about 3 times higher, indicating a strong ability to distinguish true positives from negatives in this challenging setting.

The superior performance of MHMDA-M suggests that the multiplex miRNA network captures a broader range of functional relationships by integrating experimentally validated (miRWalk) and predicted (TargetScan) interactions. For example, miRNAs that share targets in both databases are likely to regulate overlapping pathways, such as the PI3K-Akt signaling pathway, which is frequently dysregulated in cancer [27]. The multiplex structure leverages these relationships to improve prediction accuracy, as evidenced by the consistent outperformance across both databases. These findings highlight the potential of the multiplex approach to enhance the identification of disease-associated miRNAs, offering a promising direction for biomarker discovery in diseases where miRNA dysregulation plays a critical role, such as cancer and neurodegenerative disorders.

As TargetScan, a predicted miRNA-target dataset, could introduce bias due to its reliance on specific prediction algorithms, we replaced MonoNet_TargetScan with another miRNA monoplex network, MonoNet_miRTarBase, constructed from the experimentally validated miRNA-target dataset miRTarBase [38]. Supplementary Figure S1a–d (see online supplementary material for a color version of this figure) shows the performance comparison of MHMDA-M and RWRMDA on miRNA multiplex and monoplex networks using two experimentally validated miRNA-target gene datasets, miRWalk, and miRTarBase. We observed similar results to those in Fig. 3, where MHMDA-M on the miRNA multiplex network outperformed RWRMDA on the three monoplex networks (i.e. MonoNet_Integrated, MonoNet_miRTarBase, and MonoNet_miRWalk) in terms of both AUROC and AUPRC values.

In constructing the miRNA integrated network (i.e. MonoNet_Integrated) from MonoNet_miRWalk and MonoNet_TargetScan for the RWRMDA method, we chose the per-edge average method to balance the contributions of each network. To further justify this integration strategy, we explored a weighted average approach based on the size of the source networks. Specifically, weights were determined using the node ratio (node_ratio) and edge ratio (edge_ratio) between the two networks. These ratios were also applied to weight (τ) the contribution of each layer (miRNA monoplex network) in constructing the miRNA multiplex network for the MHMDA-M method. Supplementary Figure S2a–d (see online supplementary material for a color version of this figure) presents the performance comparison of MHMDA-M on the miRNA multiplex network and RWRMDA on the miRNA integrated network across different integration strategies. The results show that MHMDA-M consistently outperforms RWRMDA across all strategies (i.e. node_ratio and edge_ratio), regardless of the dataset.

Multiplex-heterogeneous and heterogeneous networks

To further demonstrate the benefit of integrating multiple miRNA functional similarity networks, we compared the prediction performance of MHMDA-MH, which uses the multiplex-heterogeneous network (DiSimNet-MultiNet_miRNA), against RWRHMDA on two heterogeneous networks: DiSimNet-MonoNet_miRWalk [i.e. RWRHMDA (MonoNet_miRWalk)] and DiSimNet-MonoNet_TargetScan [i.e. RWRHMDA (MonoNet_TargetScan)]. The comparison was conducted on the HMDD and miR2Disease datasets, ensuring the reliability and generalizability of the findings.

MHMDA-MH consistently demonstrated the best performance across both disease datasets (HMDD and miR2Disease). For AUROC, MHMDA-MH achieved values of 0.938 for HMDD and 0.913 for miR2Disease, compared to 0.914 and 0.892 for RWRHMDA (MonoNet_miRWalk), and 0.901 and 0.879 for RWRHMDA (MonoNet_TargetScan), respectively. These improvements (2.6% and 2.4% over RWRHMDA (MonoNet_miRWalk), 4.1% and 3.9% over RWRHMDA (MonoNet_TargetScan) were statistically significant (Welch’s two-sample one-tailed t-test: all P values ≤ 1.1 × 10−4). Figure 4a and b shows the ROC curves for HMDD and miR2Disease, respectively, with the x-axis representing the FPR and the y-axis representing the TPR. In both panels, the ROC curve for MHMDA-MH (red) lies above those for RWRHMDA (MonoNet_miRWalk) (green) and RWRHMDA (MonoNet_TargetScan) (blue).

Figure 4.

Figure 4

Performance comparison of MHMDA-MH and RWRHMDA on multiplex-heterogeneous and heterogeneous networks. This figure compares the performance of MHMDA-MH (DiSimNet-MultiNet_miRNA) and RWRHMDA (DiSimNet-MonoNet_miRWalk, DiSimNet-MonoNet_TargetScan) using AUROC and AUPRC, with ROC curves (panels a and b) and precision-recall curves (panels c and d). (a) ROC curves for the HMDD dataset, with the x-axis as FPR and the y-axis as TPR. (b) ROC curves for the miR2Disease dataset. (c) Precision-recall curves for the HMDD dataset, with the x-axis as recall TPR and the y-axis as precision. (d) Precision-recall curves for the miR2Disease dataset. Color scheme: MHMDA-MH in red, RWRHMDA (DiSimNet-MonoNet_miRWalk) in green, and RWRHMDA (DiSimNet-MonoNet_TargetScan) in blue

Similarly, for AUPRC, MHMDA-MH achieved the highest values: 0.019 for HMDD and 0.040 for miR2Disease, compared to 0.007 and 0.018 for RWRHMDA (MonoNet_miRWalk), and 0.009 and 0.014 for RWRHMDA (MonoNet_TargetScan), respectively. Figure 4c and d shows the precision-recall curves for HMDD and miR2Disease, respectively, with the x-axis representing recall (true positive rate) and the y-axis representing precision (positive predictive value). In Fig. 4d, MHMDA-MH reaches a peak precision of 0.08 at a recall of 0.3 for miR2Disease, compared to 0.06 for RWRHMDA (MonoNet_miRWalk). Given the extreme data imbalance, the AUPRC values are low in absolute terms but significantly higher than the baseline for a random classifier (i.e. 0.00215). MHMDA-MH’s AUPRC of 0.019 for HMDD is about 9 times higher than the baseline, and its AUPRC of 0.040 for miR2Disease is about 19 times higher, indicating a strong ability to identify true positives in this challenging setting.

The superior performance of MHMDA-MH underscores the effectiveness of combining multiplex miRNA networks with disease similarity information. The disease similarity network leverages the “disease module” principle, where phenotypically similar diseases share associated miRNAs [27, 28]. For example, diseases like lung cancer and breast cancer, which share phenotypic traits such as metastatic potential, are likely to involve common miRNAs that regulate key processes like epithelial-mesenchymal transition, a critical step in cancer progression. The multiplex miRNA network further enhances this by capturing a broader range of functional relationships, leading to more accurate predictions. The consistency of these results across both HMDD and miR2Disease databases provides robust evidence for the generalizability of the multiplex-heterogeneous approach, suggesting that it offers a promising direction for enhancing disease-miRNA association predictions and identifying biologically relevant associations for biomarker discovery.

Similarly, we replaced MonoNet_TargetScan with MonoNet_miRTarBase. Supplementary Figure S3a–d (see online supplementary material for a color version of this figure) shows the performance comparison of MHMDA-MH and RWRHMDA on multiplex-heterogeneous and heterogeneous networks using two experimentally validated miRNA-target gene datasets, miRWalk, and miRTarBase. We observed similar results to those in Fig. 4, where MHMDA-MH on the multiplex-heterogeneous network outperformed RWRHMDA on the two monoplex networks (i.e. MonoNet_miRTarBase and MonoNet_miRWalk) in terms of both AUROC and AUPRC values.

Compare with other methods

To provide a broader evaluation of MHMDA, we explored comparisons with recent state-of-the-art methods beyond RWR-based approaches, including knowledge-driven approach to the fine-grained prediction of disease-related miRNA (KDFGMDA) [26], MHXGMDA [56], and MAMFGAT [46]. Direct comparisons with KDFGMDA and MHXGMDA were not feasible due to methodological and data differences. KDFGMDA constructs a knowledge graph from HMDD, normalizing miRNA-disease relationships into triples, and employs a deep graph representation learning model with a heterogeneous neighbor encoder and LSTM (Long Short-Term Memory)-based aggregation, evaluated using global cross-validation. In contrast, MHMDA uses a multiplex-heterogeneous network with multiple miRNA similarity layers (e.g. miRWalk, TargetScan) and a disease similarity network (DiSimNet), applying a tailored RWR algorithm, and assesses performance with LOOCV per disease on HMDD and miR2Disease. Similarly, MHXGMDA integrates miRNA and disease similarity networks but lacks sufficient guidelines and accessible data for rerunning its code (https://github.com/yinboliu-git/MHXGMDA), preventing a direct comparison.

We selected MAMFGAT for a detailed comparison, as it also leverages miRNA functional similarity and disease similarity networks. We obtained its code from https://github.com/zixiaojin66/MAMFGAT-master and preprocessed our data according to its guidelines, converting edge lists of miRNA and disease similarity networks into similarity matrices and retaining only entities present in known associations. For HMDD, this resulted in a 185 × 185 disease similarity matrix, a 346 × 346 miRNA similarity matrix, and 2642 known associations; for miR2Disease, a 53 × 53 disease similarity matrix, a 116 × 116 miRNA similarity matrix, and 268 known associations. To align with MHMDA’s evaluation scheme, we modified MAMFGAT’s negative set selection from a random equal-sized sample to include all remaining associations, yielding negative-to-positive ratios of 23.22 for HMDD and 21.94 for miR2Disease. We additionally performed 5-fold cross-validation for MHMDA-MH to ensure consistency with MAMFGAT.

Performance results for MHMDA-MH and MAMFGAT using 5-fold cross-validation are as follows. For HMDD, MHMDA-MH achieved an AUROC of 0.934, while MAMFGAT achieved an AUROC of 0.937. For miR2Disease, MHMDA-MH achieved an AUROC of 0.94, while MAMFGAT achieved an AUROC of 0.875. These results indicate that MHMDA-MH and MAMFGAT are comparable on HMDD in terms of AUROC (0.934 vs. 0.937). For miR2Disease, MHMDA-MH outperforms MAMFGAT in AUROC (0.94 vs. 0.875). This outperformance can be attributed to MHMDA’s unique multiplex-heterogeneous network design, which integrates multiple miRNA functional similarity networks (e.g. from miRWalk and TargetScan) in the form of a miRNA multiplex network and a disease similarity network, capturing a broader range of biological relationships than MAMFGAT’s graph neural network (GNN)-based approach on the miRNA per-edge average-based integrated network. This comparison highlights MHMDA’s competitive performance against state-of-the-art methods, particularly in leveraging its multiplex-heterogeneous network structure.

Evaluation of predictive performance across disease characteristics

To assess the model’s robustness across diverse disease characteristics, we investigated the impact of data imbalance and variations in disease categories and tissue specificity on prediction performance using the HMDD dataset. First, we examined whether data imbalance—where the number of known disease-associated miRNAs is relatively smaller than the total miRNAs—affects prediction accuracy. We calculated the correlation between the AUROC of each disease and its number of known associated miRNAs. Figure 5a reveals a negligible correlation (correlation coefficient = −0.06, P value = 0.432), suggesting that data imbalance does not significantly influence prediction performance. Next, we classified diseases in the HMDD dataset into categories (e.g. cancer, cardiovascular, immunological, metabolic) and tissues (e.g. blood, bone, brain, heart) and compared the average AUROC across these groups. Figure 5b and c demonstrates minimal variation in performance across disease categories and tissues, with differences typically less than 2%. These findings collectively indicate that the model maintains stability across various disease contexts, including rare and common diseases, and is not heavily impacted by tissue-specific factors.

Figure 5.

Figure 5

Evaluation of prediction performance across diverse disease characteristics. (a) Correlation between AUROC and the number of known disease-associated miRNAs, showing a negligible correlation. (b) Average AUROC across disease categories (e.g. cancer, cardiovascular, immunological, metabolic). (c) Average AUROC across tissue types (e.g. blood, bone, brain, heart)

Prediction of novel disease-miRNA associations

To demonstrate the practical application of our method, we used MHMDA-MH to predict novel miRNA-disease associations by ranking miRNAs not known to be associated with each disease in the training data. The top 20 ranked miRNAs for each disease were considered promising candidates for novel associations. We validated these predictions using public databases and literature searches to assess their biological relevance.

Novel associations supported by public databases

We validated the top 20 predictions for each disease using the multiMiR R package [57], which searches for miRNA-disease associations in two databases: miR2Disease and PhenomiR [58]. A prediction was considered validated if the miRNA-disease pair was reported in either database. Tables 3 and 4 list examples of validated disease-miRNA associations for diseases in HMDD and miR2Disease, respectively. Table 3 (HMDD) includes associations such as hsa-miR-520c-3p with breast cancer, validated in both miR2Disease (PubMed ID: 18193036) and PhenomiR (PubMed ID: 16754881 and 18193036), suggesting its role in breast cancer progression through pathways like epithelial-mesenchymal transition. Table 4 (miR2Disease) includes associations like hsa-miR-125a-5p with prostate cancer, validated in both miR2Disease (PubMed IDs: 17616669) and PhenomiR (PubMed IDs: 17616669 and 16192569), potentially linked to androgen receptor signaling, and hsa-miR-143-3p with cervical cancer, which may regulate cell proliferation. These validated predictions highlight MHMDA-MH’s ability to identify biologically relevant miRNAs across a range of disease contexts, including cancers and non-cancer conditions like schizophrenia and frontotemporal dementia.

Table 3.

Examples of validated novel disease–miRNA associations for top 20 ranked miRNAs in HMDD.

miRNA Disease Database PubMed ID
hsa-miR-107 Esophageal cancer miR2Disease 18172293
hsa-miR-107 Medulloblastoma PhenomiR 18973228
hsa-miR-107 Schizophrenia miR2Disease 19721432
hsa-miR-326 Prostate cancer PhenomiR 16192569
hsa-miR-429 Lung cancer miR2Disease 19759262
hsa-miR-449a Breast cancer PhenomiR 16754881
hsa-miR-515-3p Breast cancer PhenomiR 16754881
hsa-miR-519b-3p Breast cancer PhenomiR 16754881
hsa-miR-519c-3p Breast cancer PhenomiR 16754881
hsa-miR-520c-3p Breast cancer miR2Disease 18193036
hsa-miR-520c-3p Breast cancer PhenomiR 16754881
hsa-miR-520c-3p Breast cancer PhenomiR 18193036
hsa-miR-590-5p Lung cancer PhenomiR 18766170
hsa-miR-659-3p Frontotemporal dementia miR2Disease 18723524
Table 4.

Examples of validated novel disease–miRNA associations for top 20 ranked miRNAs in miR2Disease.

miRNA Disease Database PubMed ID
hsa-miR-107 Lung cancer PhenomiR 18766170
hsa-miR-107 Lung cancer PhenomiR 16192569
hsa-miR-107 Schizophrenia miR2Disease 19721432
hsa-miR-125a-5p Prostate cancer miR2Disease 17616669
hsa-miR-125a-5p Prostate cancer PhenomiR 17616669
hsa-miR-125a-5p Prostate cancer PhenomiR 16192569
hsa-miR-126-3p Asthma miR2Disease 19843690
hsa-miR-143-3p Cervical cancer miR2Disease 17616659
hsa-miR-146b-5p Prostate cancer PhenomiR 18174313
hsa-miR-148a-3p Asthma miR2Disease 17847008
hsa-miR-148b-3p Asthma miR2Disease 17847008
hsa-miR-152-3p Asthma miR2Disease 17847008
hsa-miR-330-5p Medulloblastoma PhenomiR 18973228
hsa-miR-449a Colorectal cancer PhenomiR 18663744
hsa-miR-659-3p Frontotemporal dementia miR2Disease 18723524

Novel associations supported by the literature

Public databases of known disease-miRNA associations are often limited by curation quality and updating frequency. To address this, we further investigated the top 20 ranked miRNAs by mining PubMed for evidence of associations not captured in miR2Disease or PhenomiR, using the easyPubMed R package [59]. We focused on lung cancer (MIM ID: 211980) and prostate cancer (MIM ID: 176807), two diseases with significant clinical relevance. For each miRNA, we searched PubMed using the query “(disease[Title/Abstract]) AND (miRNA[Title/Abstract])” and considered a prediction validated if at least one study was found, indicating potential experimental or clinical evidence.

Table 5 lists the top 20 ranked miRNAs for lung cancer in both HMDD and miR2Disease datasets. Detailed PubMed IDs for the studies are provided in Supplementary Tables S1 and S2 for lung cancer and prostate cancer, respectively. For the HMDD dataset, 16 of the top 20 miRNAs were validated with at least one study, including hsa-let-7a (rank 9, 8 studies), which is known to regulate lung cancer cell proliferation, and hsa-miR-141 (rank 7, 5 studies), potentially involved in metastasis. For the miR2Disease dataset, 13 of the top 20 were validated, including hsa-miR-145 (rank 8, 9 studies), which may inhibit tumor growth, and hsa-miR-155 (rank 1, 7 studies), often associated with chemotherapy resistance. Notably, hsa-miR-429, hsa-miR-590-5p, and hsa-miR-107 (italicized in Table 5) were already validated in miR2Disease and PhenomiR (see Tables 3 and 4), showing consistency between database and literature evidence.

Table 5.

Number of PubMed studies supporting associations with lung cancer for top 20 ranked miRNAs in HMDD and miR2Disease.

HMDD
miR2Disease
Rank miRNA NuStudy rank miRNA NuStudy
1 hsa-miR-106b 3 1 hsa-miR-155 7
2 hsa-miR-20b 1 2 hsa-miR-4697-3p 0
3 hsa-miR-1 5 3 hsa-let-7a 8
4 hsa-miR-15a 3 4 hsa-miR-302b 0
5 hsa-miR-302c 0 5 hsa-miR-20b 1
6 hsa-miR-195 4 6 hsa-miR-93 3
7 hsa-miR-141 5 7 hsa-let-7c 6
8 hsa-miR-99a 3 8 hsa-miR-145 9
9 hsa-let-7a 8 9 hsa-miR-935 0
10 hsa-miR-429 1 10 hsa-miR-19a 0
11 hsa-miR-181a 5 11 hsa-miR-16 6
12 hsa-miR-181b 0 12 hsa-miR-34a 4
13 hsa-miR-15b 3 13 hsa-miR-34b 4
14 hsa-miR-302b 0 14 hsa-miR-26a 7
15 hsa-miR-10a 1 15 hsa-miR-92a 1
16 hsa-miR-424 1 16 hsa-miR-25 4
17 hsa-miR-320a 1 17 hsa-miR-520g 0
18 hsa-miR-215 3 18 hsa-miR-302c 0
19 hsa-miR-520g 0 19 hsa-miR-27a 0
20 hsa-miR-590-5p 1 20 hsa-miR-107 2

miRNAs in italic (i.e. hsa-miR-429, hsa-miR-590-5p, and hsa-miR-107) were already accumulated in miR2Disease and PhenomiR databases.

For prostate cancer, similar literature searches were conducted (detailed in Supplementary Table S2). Of the top 20 miRNAs in the HMDD dataset, 10 were validated, including hsa-miR-326 (rank 4 in Table S2, 1 study), which may regulate prostate cancer cell growth. In the miR2Disease dataset, 12 of the top 20 were validated, including hsa-miR-125a-5p (rank 11 in Table S2, 2 studies) and hsa-miR-146b-5p (rank 2 in Table S2, 3 studies), both associated with tumor progression.

MiRNAs with zero studies (e.g. hsa-miR-302c and hsa-miR-181b for HMDD, and hsa-miR-4697-3p and hsa-miR-935 for miR2Disease in Table 5) may represent novel associations requiring further experimental validation, or they could be false positives due to limitations in current literature or noise in the network data (e.g. inaccurate miRNA-target interactions). These results demonstrate MHMDA-MH’s ability to identify biologically relevant miRNA-disease associations, offering a valuable tool for biomarker discovery in cancer research.

Computational efficiency assessment

To further evaluate the practical applicability of MHMDA-MH, we assessed the computational efficiency of the RWR algorithm on the multiplex-heterogeneous networks. We calculated the average prediction time per disease across the HMDD and miR2Disease datasets. The average time was 0.1199 seconds per disease for HMDD and 0.1110 seconds per disease for miR2Disease, demonstrating that the algorithm processes large-scale networks efficiently within a fraction of a second per prediction. This efficiency supports the scalability of MHMDA-MH for real-world applications, even with the increased complexity of integrating multiple network layers and disease similarity information.

Discussion

Our study introduces MHMDA, a novel method for predicting disease-miRNA associations that leverages the power of multiplex-heterogeneous networks. The original goal of this study was to develop a robust and comprehensive method that integrates diverse biological data to enhance prediction accuracy and applicability across various disease contexts, overcoming the limitations of single-network approaches like RWRMDA and heterogeneous methods like RWRHMDA. The results demonstrate that MHMDA consistently outperforms existing approaches, offering a more comprehensive and accurate tool for identifying potential disease-associated miRNAs. Indeed, our experiments show that the multiplex miRNA network (MHMDA-M) consistently outperforms monoplex networks across different databases of known disease-miRNA associations (HMDD and miR2Disease). This underscores the value of integrating multiple sources of miRNA functional similarity data. Furthermore, the multiplex-heterogeneous network (MHMDA-MH) demonstrates superior performance compared to both multiplex and heterogeneous networks, highlighting the synergistic effect of combining multiplex miRNA networks with disease similarity information. In addition, MHMDA exhibits stable performance across different parameter values, indicating its resilience to parameter tuning. This robustness is particularly evident in the multiplex-heterogeneous network, suggesting that the integration of diverse biological information enhances the model’s stability. These results validate the study’s goal of improving prediction accuracy through a comprehensive network approach, as further supported by MHMDA’s consistent performance across disease categories and tissues and its negligible correlation with data imbalance. Moreover, the superior performance of MHMDA is observed consistently across different databases (HMDD and miR2Disease), reinforcing the generalizability of our approach. When compared to the state-of-the-art MAMFGAT method [46], MHMDA demonstrated similar AUROC performance on HMDD and improved AUROC on miR2Disease, evaluated through 5-fold cross-validation. Finally, the application of MHMDA to predict novel miRNAs associated with specific diseases demonstrates its potential for real-world impact in biomedical research.

These findings have several important implications for the field of disease-miRNA association prediction. First, our results emphasize the importance of integrating multiple sources of biological information, including various miRNA-target databases and disease similarity data, enabling a more nuanced and accurate representation of the complex relationships between miRNAs and diseases. Second, MHMDA’s success in leveraging both miRNA functional similarities and disease similarities provides strong support for the “disease module” principle, suggesting that considering the multifaceted nature of biological relationships leads to more accurate predictions. Third, the improved accuracy of MHMDA in predicting disease-miRNA associations opens new avenues for identifying potential biomarkers and therapeutic targets. While computational predictions suggest potential new biomarkers and therapeutic targets, their confirmation requires experimental validation (e.g. qPCR in cell lines), which we plan to address through future partnerships with experimental researchers to overcome current resource limitations. MHMDA’s computational predictions integrate into the biomarker discovery pipeline by generating prioritized candidate associations, which can be filtered for biological relevance (e.g. tissue specificity) and validated through in vitro or in vivo studies, ultimately supporting clinical trial design and personalized therapies. However, practical barriers to this integration include the need for extensive experimental validation to confirm miRNA functionality, variability in clinical data availability across diseases, and the computational demands of scaling to genome-wide analyses, necessitating advanced infrastructure. This could accelerate the process of understanding disease mechanisms and developing novel treatments. Finally, the multiplex-heterogeneous network approach introduced in this study represents a significant methodological advancement in the field of network-based prediction methods, providing a framework that can potentially be extended to other areas of biological network analysis.

Despite these promising results, several limitations warrant consideration. First, while many predictions were validated using databases and literature, experimental validation is necessary to confirm the biological relevance of novel associations, such as unvalidated miRNAs (e.g. hsa-miR-302c). The high ranking of unvalidated miRNAs like hsa-miR-302c may reflect MHMDA’s ability to detect novel functional similarities, though this could also be influenced by noise in the network data or gaps in literature; future studies will explore these causes through sensitivity analyses or experimental validation. While computational predictions provide a strong foundation, their translation to clinical use requires empirical validation through biological experiments, which we aim to pursue through future collaborations given current resource limitations. Second, the miRNA-target interaction data (e.g. from miRWalk, TargetScan) may contain noise or biases, potentially leading to false positives in the network. Third, the extreme data imbalance may affect prediction accuracy, particularly for diseases with few known miRNAs. To safeguard against over-fitting, MHMDA employs several strategies: the use of LOOCV ensures evaluation on unseen data, reducing over-fitting risk, while the integration of multiple data sources introduces diverse biological signals, preventing reliance on a single dataset’s patterns. The parameter stability observed in Fig. 2 further indicates that MHMDA-MH does not overfit to specific parameter values, as performance remains consistent across a wide range. However, future enhancements could include regularization techniques (e.g. L2 regularization) or ensemble methods to further mitigate over-fitting risks.

Looking ahead, the future direction of miRNA-disease association prediction builds on MHMDA’s current strengths and addresses its limitations, including computational constraints and practical barriers. First, integrating additional omics data, such as RNA-seq-based gene expression profiles or single-cell sequencing data, could provide a more comprehensive view of miRNA-disease interactions, though obtaining high-quality, standardized datasets across diverse contexts remains challenging. Second, incorporating temporal and spatial data (e.g. miRNA expression changes over disease progression or tissue-specific profiles) could improve context-specific predictions, requiring longitudinal studies and advanced integration techniques to manage variability. Third, addressing data imbalance, particularly for rare diseases, is crucial; beyond oversampling, methods like cost-sensitive learning or synthetic data generation (e.g. VAE (Variational AutoEncoder)) could enhance performance. These enhancements aim to overcome practical barriers such as validation scalability and data variability, facilitating MHMDA’s integration into precision medicine. Finally, leveraging advanced computational methods, such as transformers or GNNs, could better capture complex, nonlinear relationships within multiplex-heterogeneous networks, though this requires addressing current computational resource limitations.

Building on the deep learning and network-based methods discussed in the Introduction, we envision a hybrid approach that combines their strengths to overcome limitations [26, 46, 56]. Deep learning models like transformers and GNNs extract complex features from high-dimensional data (e.g. miRNA sequences), but their reliance on large, high-quality datasets and computational resources poses challenges with sparse or noisy miRNA data (e.g. miRWalk, TargetScan). While deep learning tools like transformers can leverage additional dimensions such as sequence similarity, MHMDA prioritizes network-based relational data to ensure robustness with sparse miRNA data, though future work could explore integrating sequence features pending comprehensive datasets and computational support. Network-based approaches, like MHMDA’s multiplex-heterogeneous framework, provide prior knowledge and interpretability, mitigating these issues through relational integration (e.g. miRNA-disease associations). Combining these—e.g. using GNNs for embeddings and transformers for temporal changes—could enhance robustness, scalability, and clinical applicability, addressing data scarcity and context-specific needs.

In conclusion, MHMDA represents a significant advancement in disease-miRNA association prediction. By leveraging multiplex-heterogeneous networks, our method offers improved accuracy, robustness, and generalizability across disease contexts, with potential to accelerate biomedical research and support novel diagnostic and therapeutic strategies.

Supplementary Material

bpaf065_Supplementary_Data

Author contributions

The author (Duc-Hau Le) contributed to all stages of the study.

Supplementary data

Supplementary data is available at Biology Methods and Protocols online.

Conflict of interest statement. None declared.

Funding

None declared.

Data availability

Source code and experiment data can be accessed at https://github.com/hauldhut/MHMDA.

References

  • 1. He L, Hannon GJ.  MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet  2004;5:522–31. [DOI] [PubMed] [Google Scholar]
  • 2. Yang Z, Ren F, Liu C  et al.  dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics  2010;11:S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Conrad R, Barrier M, Ford LP.  Role of miRNA and miRNA processing factors in development and disease. Birth Defects Res C Embryo Today  2006;78:107–17. [DOI] [PubMed] [Google Scholar]
  • 4. Li Y, Kowdley KV.  MicroRNAs in common human diseases. Genom Proteom Bioinform  2012;10:246–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Mendell JT, Olson EN.  MicroRNAs in stress signaling and human disease. Cell  2012;148:1172–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Esteller M.  Non-coding RNAs in human disease. Nat Rev Genet  2011;12:861–74. [DOI] [PubMed] [Google Scholar]
  • 7. Steinfeld I, Navon R, Ach R  et al.  miRNA target enrichment analysis reveals directly active miRNAs in health and disease. Nucleic Acids Res  2013;41:e45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Salvatore M, Magrelli A, Taruscio D.  The role of microRNAs in the biology of rare diseases. Int J Mol Sci  2011;12:6733–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Rottiers V, Näär AM.  MicroRNAs in metabolism and metabolic disorders. Nat Rev Mol Cell Biol  2012; 13:239–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Ali AS, Ali S, Ahmad A  et al.  Expression of microRNAs: potential molecular link between obesity, diabetes and cancer. Obes Rev  2011;12:1050–62. [DOI] [PubMed] [Google Scholar]
  • 11. Chen X, Xie D, Zhao Q  et al.  MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform  2019;20:515–39. [DOI] [PubMed] [Google Scholar]
  • 12. Huang L, Zhang L, Chen X.  Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models. Brief Bioinform  2022;23:bbac407. [DOI] [PubMed] [Google Scholar]
  • 13. Xu J, Li C-X, Lv J-Y  et al.  Prioritizing candidate disease miRNAs by topological features in the miRNA target-dysregulated network: case study of prostate cancer. Mol Cancer Ther  2011;10:1857–66. [DOI] [PubMed] [Google Scholar]
  • 14. Chen X, Wu Q-F, Yan G-Y.  RKNNMDA: ranking-based KNN for MiRNA-disease association prediction. RNA Biol  2017;14:952–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Chen X, Huang L, Xie D  et al.  EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction. Cell Death Dis  2018;9:3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Chen X, Zhu C-C, Yin J.  Ensemble of decision tree reveals potential miRNA-disease associations. PLOS Comput Biol  2019;15:e1007209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Chen X, Yan G-Y.  Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep  2014;4:5501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Chen X, Wang L, Qu J  et al.  Predicting miRNA–disease association based on inductive matrix completion. Bioinformatics  2018;34:4256–65. [DOI] [PubMed] [Google Scholar]
  • 19. Chen X, Sun L-G, Zhao Y.  NCMCMDA: miRNA–disease association prediction through neighborhood constraint matrix completion. Brief Bioinform  2021;22:485–96. [DOI] [PubMed] [Google Scholar]
  • 20. Chen X, Yin J, Qu J  et al.  MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLOS Comput Biol  2018;14:e1006418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Chen X, Li T-H, Zhao Y  et al.  Deep-belief network for predicting potential miRNA-disease associations. Brief Bioinform  2021;22:bbaa186. [DOI] [PubMed] [Google Scholar]
  • 22. Peng J, Hui W, Li Q  et al.  A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics  2019;35:4364–71. [DOI] [PubMed] [Google Scholar]
  • 23. Li Z, Li J, Nie R  et al.  A graph auto-encoder model for miRNA-disease associations prediction. Brief Bioinform  2021;22:bbaa240. [DOI] [PubMed] [Google Scholar]
  • 24. Chu Y, Wang X, Dai Q  et al.  MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform  2021;22:bbab165. [DOI] [PubMed] [Google Scholar]
  • 25. Tang X, Luo J, Shen C  et al.  Multi-view multichannel attention graph convolutional network for miRNA–disease association prediction. Brief Bioinform  2021;22:bbab174. [DOI] [PubMed] [Google Scholar]
  • 26. Yu S, Wang H, Liu T  et al.  A knowledge-driven network for fine-grained relationship detection between miRNA and disease. Brief Bioinform  2022;23:bbac058. [DOI] [PubMed] [Google Scholar]
  • 27. Bartel DP.  MicroRNAs: genomics, biogenesis, mechanism, and function. Cell  2004;116:281–97. [DOI] [PubMed] [Google Scholar]
  • 28. Lu M, Zhang Q, Deng M  et al.  An analysis of human MicroRNA and disease associations. PLoS ONE  2008;3:e3420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Le D-H.  Network-based ranking methods for prediction of novel disease associated microRNAs. Comput Biol Chem  2015;58:139–48. [DOI] [PubMed] [Google Scholar]
  • 30. Jiang Q, Hao Y, Wang G  et al.  Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol  2010;4:S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Qinghua J, Yangyang H, Guohua W  et al. Weighted network-based inference of human MicroRNA-disease associations. In: Frontier of Computer Science and Technology (FCST), 2010 Fifth International Conference on: 18–22 August 2010, 431–35.
  • 32. Wang D, Wang J, Lu M  et al.  Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics  2010;26:1644–50. [DOI] [PubMed] [Google Scholar]
  • 33. Chen X, Liu M-X, Yan G-Y.  RWRMDA: predicting novel human microRNA-disease associations. Mol Biosyst  2012;8:2792–8. [DOI] [PubMed] [Google Scholar]
  • 34. Chen H, Zhang Z.  Prediction of associations between OMIM diseases and microRNAs by random walk on OMIM disease similarity network. Sci World J  2013;2013:204658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Le D-H, Verbeke L, Son LH  et al.  Random walks on mutual microRNA-target gene interaction network improve the prediction of disease-associated microRNAs. BMC Bioinform  2017;18:479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Le D-H. Disease phenotype similarity improves the prediction of novel disease-associated microRNAs. In: Information and Computer Science (NICS), 2015 2nd National Foundation for Science and Technology Development Conference on: 16–18 September 2015,  76–81.
  • 37. Sticht C, De La Torre C, Parveen A  et al.  miRWalk: an online resource for prediction of microRNA binding sites. Plos ONE  2018;13:e0206239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Huang H-Y, Lin Y-C-D, Cui S  et al.  miRTarBase update 2022: an informative resource for experimentally validated miRNA–target interactions. Nucleic Acids Res  2022;50:D222–D230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. McGeary SE, Lin KS, Shi CY  et al.  The biochemical basis of microRNA targeting efficacy. Science  2019;366:eaav1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Agarwal V, Bell GW, Nam J-W  et al.  Predicting effective microRNA target sites in mammalian mRNAs. eLife  2015;4:e05005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lewis BP, Burge CB, Bartel DP.  Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are MicroRNA targets. Cell  2005;120:15–20. [DOI] [PubMed] [Google Scholar]
  • 42. Friedman RC, Farh KK-H, Burge CB  et al.  Most mammalian mRNAs are conserved targets of microRNAs. Genome Res  2009;19:92–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Chen Y, Wang X.  miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res  2020;48:D127–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Huang L, Zhang L, Chen X.  Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion. Brief Bioinform  2022;23:bbac407. [DOI] [PubMed] [Google Scholar]
  • 45. Kariuki D, Asam K, Aouizerat BE  et al.  Review of databases for experimentally validated human microRNA–mRNA interactions. Database  2023;2023:baad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Jin Z, Wang M, Tang C  et al.  Predicting miRNA-disease association via graph attention learning and multiplex adaptive modality fusion. Comput Biol Med  2024;169:107904. [DOI] [PubMed] [Google Scholar]
  • 47. Dweep H, Sticht C, Pandey P  et al.  miRWalk—database: prediction of possible miRNA binding sites by “walking” the genes of three genomes. J Biomed Inform  2011;44:839–47. [DOI] [PubMed] [Google Scholar]
  • 48. Lewis BP, Shih I, Jones-Rhoades MW  et al.  Prediction of mammalian microRNA targets. Cell  2003;115:787–98. [DOI] [PubMed] [Google Scholar]
  • 49. Jiang Q, Wang Y, Hao Y  et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res  2009;37:D98–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Cui C, Zhong B, Fan R  et al.  HMDD v4.0: a database for experimentally supported human microRNA-disease associations. Nucleic Acids Res  2024;52:D1327–1332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. van Driel MA, Bruggeman J, Vriend G  et al.  A text-mining analysis of the human phenome. Eur J Hum Genet  2006;14:535–42. [DOI] [PubMed] [Google Scholar]
  • 52. Valdeolivas A, Tichit L, Navarro C  et al.  Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics  2019;35:497–505. [DOI] [PubMed] [Google Scholar]
  • 53. Li Y, Patra JC.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics  2010;26:1219–24. [DOI] [PubMed] [Google Scholar]
  • 54. Huang L, Zhang L, Chen X.  Updated review of advances in microRNAs and complex diseases: towards systematic evaluation of computational models. Brief Bioinform  2022;23:bbac407. [DOI] [PubMed] [Google Scholar]
  • 55. Kohler S, Bauer S, Horn D  et al.  Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet  2008;82:949–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Wen S, Liu Y, Yang G  et al.  A method for miRNA-disease association prediction using machine learning decoding of multi-layer heterogeneous graph transformer encoded representations. Sci Rep  2024;14:24181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Ru Y, Kechris KJ, Tabakoff B  et al.  The multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations. Nucleic Acids Res  2014;42:e133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Ruepp A, Kowarsch A, Schmidl D  et al.  PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes. Genome Biol  2010;11:R6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Fantini D.  easyPubMed: search and retrieve scientific publication records from PubMed. R Package (https://CRAN.R-project.org/package=easyPubMed) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

bpaf065_Supplementary_Data

Data Availability Statement

Source code and experiment data can be accessed at https://github.com/hauldhut/MHMDA.


Articles from Biology Methods & Protocols are provided here courtesy of Oxford University Press

RESOURCES