miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

Junpeng Zhang; Lin Liu; Taosheng Xu; Wu Zhang; Chunwen Zhao; Sijing Li; Jiuyong Li; Nini Rao; Thuc Duy Le

doi:10.1080/15476286.2021.1905341

. 2021 Apr 6;18(12):2308–2320. doi: 10.1080/15476286.2021.1905341

miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

Junpeng Zhang ^a,^b,^✉, Lin Liu ^c, Taosheng Xu ^d, Wu Zhang ^e, Chunwen Zhao ^b, Sijing Li ^b, Jiuyong Li ^c, Nini Rao ^a,^✉, Thuc Duy Le ^c,^✉

PMCID: PMC8632112 PMID: 33822666

ABSTRACT

In molecular biology, microRNA (miRNA) sponges are RNA transcripts which compete with other RNA transcripts for binding with miRNAs. Research has shown that miRNA sponges have a fundamental impact on tissue development and disease progression. Generally, to achieve a specific biological function, miRNA sponges tend to form modules or communities in a biological system. Until now, however, there is still a lack of tools to aid researchers to infer and analyse miRNA sponge modules from heterogeneous data. To fill this gap, we develop an R/Bioconductor package, miRSM, for facilitating the procedure of inferring and analysing miRNA sponge modules. miRSM provides a collection of 50 co-expression analysis methods to identify gene co-expression modules (which are candidate miRNA sponge modules), four module discovery methods to infer miRNA sponge modules and seven modular analysis methods for investigating miRNA sponge modules. miRSM will enable researchers to quickly apply new datasets to infer and analyse miRNA sponge modules, and will consequently accelerate the research on miRNA sponges.

KEYWORDS: miRNA, lncRNA, ceRNA, miRNA sponge modules, modular analysis

Introduction

Approximately 21 nucleotides in length, microRNAs (miRNAs) are small non-coding post-transcriptional regulators that play critical roles in many biological processes and human complex diseases [1–4]. By competing for common miRNA response elements, different RNA transcripts, including both coding and non-coding ones, influence each other’s expression levels in the form of crosstalk, thereby reduce the amount of miRNA transcripts in a cell [5]. These RNA transcripts are termed as competing endogenous RNAs (ceRNAs) or miRNA sponges, including long non-coding RNAs (lncRNAs), circular RNAs (circRNAs), pseudogenes and messenger RNAs (mRNAs). miRNA sponges have been found to sequester miRNAs from their natural targets, which would consequently increase in expression, resulting in physiological impact on many cellular processes [6]. More and more evidence has revealed that miRNA sponges play important roles in pathological conditions, such as human cancer [6–8]. For instance, experimental evidence shows that the pseudogene PTENP1 acts as a miRNA sponge regulating the expression levels of the tumour suppressor gene PTEN through providing additional miRNA binding sites in human cells and exerts a growth-inhibition role in a DICER-dependent manner [9].

As is known, genes tend to form modules or communities to perform a certain biological function [10]. Thus, it is necessary to investigate the function of miRNA sponges (which are genes) at module level. However, there is currently only one tool, the miRspongeR R package [11] available for inferring miRNA sponge modules based on graph-based clustering strategy. Specifically, miRspongeR identifies miRNA sponge modules based on the pre-identified miRNA sponge interaction networks, with one of the four graph clustering algorithms provided in the package. The pre-identified miRNA sponge interaction networks can be generated by three categories of computational methods, including pair-wise correlation-based approach, partial association-based approach and mathematical modelling approach [7]. However, the methods used in miRspongeR have two main limitations as mentioned in [12]. First, it is not clear whether or not the clustered miRNA sponges and target genes in a module identified by the methods have high collective positive correlation. Here, the collective positive correlation contributes to assess whether the modular competition between the clustered miRNA sponges and target genes in a module is strong or not. Second, the methods in miRspongeR do not consider the influence of common miRNAs on the expression of the miRNA sponges and miRNA targets in an identified module. Since the identified miRNA sponge modules are mediated by miRNAs, it is necessary to consider the influence of common miRNAs.

It is well known that heterogeneous data consist of different information, and allows us to consider multiple views for solving specific problems [13]. Several methods [14–17] have been proposed to integrate heterogeneous data for inferring miRNA sponge modules. However, these approaches still have the above-mentioned limitations. To address these limitations, it is desirable to study miRNA sponge modules using heterogeneous data, such as expression data and putative miRNA–target interactions and to develop new tools for miRNA sponge module identification. Therefore, in this paper, we present a new R/Bioconductor package, miRSM, to provide a pipeline for the identification and analysis of miRNA sponge modules from heterogeneous data (including expression data and putative miRNA–target interactions).

To infer and analyse miRNA sponge modules from heterogeneous data, miRSM has the following four contributions. Firstly, miRSM is a comprehensive R package for discovering miRNA sponge modules from heterogeneous data. Secondly, in miRSM, we have implemented 50 different methods for identifying gene co-expression modules and four module discovery methods to identify miRNA sponge modules. Third, we provide a parameter selection guideline for users to identify miRNA sponge modules for their own datasets. Finally, we propose an evaluation for each of the stages in the whole pipeline, and guide users to choose the preferred methods when using their own datasets.

Methods

Design and implementation

As shown in Figure 1, miRSM contains three stages in its pipeline. The required input data for miRSM include matched miRNA and gene (e.g. lncRNAs, circRNAs, pseudogenes and mRNAs) expression data, and putative miRNA–target interactions. Here, the putative miRNA–target interactions can be miRNA–mRNA, miRNA–lncRNA, miRNA–circRNA, and miRNA–pseudogene interactions, etc. The miRNA–target interaction refers to the relationship between a miRNA and a validated or predicted target gene (i.e. mRNA, lncRNA, circRNA, and pseudogene) of the miRNA. In Stage 1, to allow comparisons between two different groups of coding or non-coding RNAs, miRSM expects two matched RNA₁ and RNA₂ expression data as input for identifying co-expression modules. Using one of the (bi)clustering methods implemented in the package, miRSM infers RNA₁-RNA₂ co-expression modules from matched RNA₁ and RNA₂ expression data. Here, in Stage 2, by integrating the identified RNA₁-RNA₂ co-expression modules in Stage 1, miRNA expression data and putative miRNA–target interactions, miRSM regards a co-expression module as a miRNA sponge module if the group of RNA₁ and the group of RNA₂ in the co-expression module have (i) significant common miRNAs, (ii) high matrix correlation (correlation between two matrices, e.g. canonical correlation) between their expression levels, and (iii) adequate sensitivity matrix correlation (correlation between two matrices conditionally to another matrix, e.g. sensitivity canonical correlation) conditioning on the expression levels of their common miRNAs. After obtaining the miRNA sponge modules meeting the above three criteria, in Stage 3, modular analysis is performed to help investigate the underlying information from the modules, and discover functional modules associated with biological processes. In the following, we will describe these stages in detail.

Co-expression analysis methods for inferring RNA₁-RNA₂ co-expression modules

At the expression level, gene co-expression modules are groups of genes with highly correlated expression patterns. Gene co-expression modules could contribute to identify miRNA sponge modules with strong competition between a group of RNA₁ and a group of RNA₂ since modules with high matrix correlation between the two groups tend to show gene co-expression patterns. For finding gene co-expression modules in Stage 1, miRSM has collected 50 state-of-the-art co-expression analysis methods as listed in Figure 2, including clustering [18], biclustering [19] and network clustering [20] algorithms. Users can use the 50 co-expression analysis methods to infer RNA₁-RNA₂ co-expression modules by calling seven utility functions (including module_WGCNA, module_GFA, module_igraph, module_ProNet, module_NMF, module_clust, and module_biclust) in miRSM. The usage and related references of these co-expression analysis methods can be seen at https://bioconductor.org/packages/release/bioc/manuals/miRSM/man/miRSM.pdf.

Figure 2. — Co-expression analysis methods for inferring gene co-expression modules. The number of clustering, biclustering and network clustering methods included in *miRSM* is 18, 21 and 11, respectively

Module discovery methods for inferring miRNA sponge modules

The module discovery methods of identifying miRNA sponge modules in miRSM are based on the miRNA sponge modular competition hypothesis [12]. In Stage 2, from the RNA₁-RNA₂ co-expression modules identified in Stage 1, miRSM identifies miRNA sponge modules by following the three criteria described previously. For the first criterion, based on putative miRNA–target interactions, miRSM uses a hypergeometric test to calculate the significance of common miRNAs between the group of RNA₁ and the group of RNA₂ in a co-expression module as follows:

p - v a l u e = 1 - \sum_{i = 0}^{y - 1} \frac{(\begin{matrix} M \\ i \end{matrix}) (\begin{matrix} N - M \\ K - i \end{matrix})}{(\begin{matrix} N \\ K \end{matrix})}

(1)

where N denotes the number of miRNAs in the miRNA expression data, M and K represent the numbers of miRNAs regulating the group of RNA₁ and the group of RNA₂ in the co-expression module, respectively, and y is the number of miRNAs shared by the group of RNA₁ and the group of RNA₂ in the co-expression module.

For the second criterion, to calculate matrix correlation ( $M C_{R N A_{1} - R N A_{2}}$ ) between the group of RNA₁ and the group of RNA₂ in a co-expression module, miRSM provides three state-of-the-art matrix correlation methods, including Canonical Correlation [21], Distance Correlation [22] and RV (the subspace extension of the R² coefficient between two random vectors) Coefficient [23]. The Canonical Correlation and RV Coefficient are linear measures, and the Distance Correlation is a non-linear measure.

For the third criterion, miRSM provides three measures: Sensitivity Canonical Correlation (SCC), Sensitivity Distance Correlation (SDC) and Sensitivity RV Coefficient (SRVC), to compute sensitivity matrix correlation ( $S M C_{R N A_{1} - R N A_{2}}$ ). For each measure, the $S M C_{R N A_{1} - R N A_{2}}$ between the group of RNA₁ and the group of RNA₂ in a co-expression module is calculated in the following:

S M C_{R N A_{1} - R N A_{2}} = M C_{R N A_{1} - R N A_{2}} - P M C_{R N A_{1} - R N A_{2}}

(2)

where $M C_{R N A_{1} - R N A_{2}}$ is the matrix correlation between the group of RNA₁ and the group of RNA₂ mentioned in the second criterion, and $P M C_{R N A_{1} - R N A_{2}}$ is the partial matrix correlation between the group of RNA₁ and the group of RNA₂, i.e. the matrix correlation between the two groups of RNAs conditioning on common miRNAs.

$P M C_{R N A_{1} - R N A_{2}}$ can be computed as:

\begin{aligned} P M C_{R N A_{1} - R N A_{2}} = \\ \overset{}{} \overset{}{} \overset{}{} \overset{}{} \{\begin{matrix} \frac{M C_{R N A_{1} - R N A_{2}} - M C_{m i R - R N A_{1}} M C_{m i R - R N A_{2}}}{\sqrt{1 - M C_{m i R - R N A_{1}}^{2}} \sqrt{1 - M C_{m i R - R N A_{2}}^{2}}}, \overset{}{} \overset{}{} \\ M C_{m i R - R N A_{1}} \neq 1 \overset{}{} a n d \overset{}{} M C_{m i R - R N A_{2}} \neq 1 \\ 0, M C_{m i R - R N A_{1}} = 1 \overset{}{} o r \overset{}{} M C_{m i R - R N A_{2}} = 1 \end{matrix} \end{aligned}

(3)

where $M C_{m i R - R N A_{1}}$ ( $M C_{m i R - R N A_{2}}$ ) is the matrix correlation between common miRNAs in the co-expression module and the group of RNA₁ (RNA₂) in the co-expression module.

Parameter selection guideline for identifying miRNA sponge modules

In miRSM, several parameters in module discovery methods should be set to identify miRNA sponge modules. First, the number of miRNAs shared by the group of RNA₁ and the group of RNA₂ should be set to at least 3. It is explained that we focus on investigate the competition between the group of RNA₁ and the group of RNA₂ conditioning on the group of common miRNAs. Moreover, the calculated p-values of common miRNAs (criterion 2) by using a hypergeometric test should be at a significant level (e.g. less than 0.05).

In addition, the value of $M C_{R N A_{1} - R N A_{2}}$ (range from 0 to 1) is used to quantify the strength of competition between the group of RNA₁ and the group of RNA₂ and larger $M C_{R N A_{1} - R N A_{2}}$ values indicate stronger competition. Normally, the $M C_{R N A_{1} - R N A_{2}}$ value larger than 0.8 is regarded as good-level. Therefore, the cut-off value of $M C_{R N A_{1} - R N A_{2}}$ (criterion 2) is suggested to be at least 0.8.

Finally, the value of $S M C_{R N A_{1} - R N A_{2}}$ reflects the influence of the group of common miRNAs on the competition between the group of RNA₁ and the group of RNA₂. To evaluate the statistical significance of this influence, miRSM can apply the null model presented in [24]. In the null model, the hypothesis is that the group of common miRNAs has no influence on the value of $M C_{R N A_{1} - R N A_{2}}$ between the group of RNA₁ and the group of RNA₂, i.e. the value of $S M C_{R N A_{1} - R N A_{2}}$ between the group of RNA₁ and the group of RNA₂ is 0. To have more precise significance p-values, the number of datasets sampled is better set to be large (e.g. equal or larger than 1E+06) when using the null model. In this work, the cut-off value of $S M C_{R N A_{1} - R N A_{2}}$ depends on the cut-off value of $M C_{R N A_{1} - R N A_{2}}$ and the number of samples. Given the number of samples, larger values of $S M C_{R N A_{1} - R N A_{2}}$ and $M C_{R N A_{1} - R N A_{2}}$ would correpond to smaller significance p-values in the null model. For example, if the number of samples is 100, the first parameter setting ( $M C_{R N A_{1} - R N A_{2}} = 0.8$ , $S M C_{R N A_{1} - R N A_{2}} = 0.1$ ) and the second parameter setting ( $M C_{R N A_{1} - R N A_{2}} = 0.85$ , $S M C_{R N A_{1} - R N A_{2}} = 0.15$ ) correpond to significance p-values with 1.04E-02 and 1.08E-03 (1E+06 datasets sampled in the null model), respectively. This means that given the value of $M C_{R N A_{1} - R N A_{2}}$ (0.8 or 0.85) and the number of samples (100), the value of $S M C_{R N A_{1} - R N A_{2}}$ (0.1 or 0.15) reflects that the influence of the group of common miRNAs on the competition between the group of RNA₁ and the group of RNA₂ is statistically significant (p-value less than 0.05). Therefore, if the cut-off value of $M C_{R N A_{1} - R N A_{2}}$ is set to 0.8 or 0.85 and the number of samples is 100, the cut-off value of $S M C_{R N A_{1} - R N A_{2}}$ (0.1 or 0.15) is reasonable to identify miRNA sponge modules at a significant level.

Empirically, by default in miRSM, the cut-off value of significance p-value (criterion 1), $M C_{R N A_{1} - R N A_{2}}$ (criterion 2), and $S M C_{R N A_{1} - R N A_{2}}$ (criterion 3) is set to 0.05, 0.8 and 0.1, respectively. Of course, users can also correspondingly change the cut-off values of these parameters based on their actual demands.

Modular analysis of miRNA sponge modules

Performing modular analysis of miRNA sponge modules could help to discover functional modules associated with cancer progression and development, e.g. module biomarkers [12,25]. In Stage 3, for investigating the identified miRNA sponge modules, miRSM has implemented seven modular analysis methods as follows:

Functional enrichment analysis
Cancer enrichment analysis
Validation analysis
Pair-wise co-expression analysis
miRNA distribution analysis
miRNA target prediction
Identification of miRNA sponge interactions

Functional enrichment analysis

miRSM performs six types of functional enrichment analysis, including Gene Ontology (GO) [26], Kyoto Encyclopaedia of Genes and Genomes Pathway (KEGG) [27], Reactome Pathway [28], Disease Ontology (DO) [29], DisGeNET [30] and Network of Cancer Genes (NCG) [31] enrichment analysis. A hypergeometric test is used to compute the significance p-values of the identified miRNA sponge modules enriched in GO, KEGG, Reactome, DO, DisGeNET and NCG terms. A miRNA sponge module which is significantly enriched in at least one of GO, KEGG, Reactome, DO, DisGeNET and NCG terms (e.g. significance p-value <0.05) is defined as a functional module.

Cancer enrichment analysis

For cancer enrichment analysis, users can collect cancer-related genes of interest and further check whether the identified miRNA sponge modules are significantly enriched in them or not. In miRSM, a hypergeometric test is used to compute the significance p-values of the identified miRNA sponge modules enriched in the collected cancer-related genes.

Validation analysis

In the validation analysis, users can understand the experimentally validated miRNA sponge interactions exist in the identified miRNA sponge module. In each miRNA sponge module, miRSM takes the RNA₁–RNA₂ pairs as predicted miRNA sponge interactions. For each miRNA sponge module, the overlap between the predicted miRNA sponge interactions and the ground truth is the experimentally validated miRNA sponge interactions in it.

Pair-wise co-expression analysis

The collective positive correlation only reflects co-expression level between the group of RNA₁ and the group of RNA₂ rather than pair-wise co-expression level of RNA₁–RNA₂ pairs. To evaluate whether the RNA₁–RNA₂ pairs included in a miRNA sponge module are statistically co-expressed with each other, miRSM calculates average (mean and median) absolute Pearson correlations of all the RNA₁–RNA₂ pairs in each miRNA sponge module to see the overall pair-wise co-expression level between the RNAs in the RNA₁ and RNA₂ groups included in the miRNA sponge module. For each miRNA sponge module, miRSM performs a permutation test by randomly generating G (i.e. 1000) modules with the same number of RNA₁ and RNA₂ to compute the statistical significance of the co-expression level. If the overall co-expression level of an identified miRNA sponge module is significantly higher than its corresponding random modules (e.g. significance p-value <0.05), the RNA₁–RNA₂ pairs in the identified miRNA sponge module are statistically co-expressed with each other.

miRNA distribution analysis

As mediators, miRNAs would probably mediate more than one miRNA sponge module. To investigate the distribution of the miRNAs shared by the identified miRNA sponge modules, the miRNA distribution analysis can be used to understand whether the shared miRNAs act as crosslinks across different miRNA sponge modules. Here, miRNAs mediating at least two miRNA sponge modules are regarded as crosslinks.

miRNA target prediction

By using the identified miRNA sponge modules and their common miRNAs, miRSM can predict miRNA–target interactions, including miRNA–RNA₁ and miRNA–RNA₂ interactions. miRSM predicts miRNA targets based on two rules. First, the RNA₁ and RNA₂ in each module have significant common miRNAs at the sequence level. Second, the RNA₁ and RNA₂ in each module are highly correlated at expression level. Consequently, given a miRNA sponge module and their common miRNAs, the miRNA–RNA₁ and miRNA–RNA₂ pairs are treated as potential miRNA–target interaction pairs in miRSM.

Identification of miRNA sponge interactions

Based on the identified miRNA sponge modules, miRSM can also identify miRNA sponge interactions underlying in them. Based on two rules as described in Section miRNA target prediction, the RNA₁–RNA₂ pairs are also regarded as potential miRNA sponge interaction pairs in miRSM. Therefore, for each miRNA sponge module, miRSM merges the RNA₁–RNA₂ pairs to identify miRNA sponge interaction network.

Evaluation of methods to infer miRNA sponge modules

In miRSM, inferring miRNA sponge modules involve two stages (including Stage 1 and Stage 2). Given a dataset, to select preferred methods for inferring miRNA sponge modules, it is necessary to conduct an evaluation of different co-expression analysis methods (Stage 1) and module discovery methods (Stage 2). Let r_ij denote the rank score of the ith method in terms of the jth indicator and w_j denote the weight of the jth indicator. An indicator is a measurable characteristic that can be used to show the performance of different methods. The overall rank score ors_i of the ith method in terms of m indicators is calculated:

\begin{aligned} o r s_{i} = \sum_{j = 1}^{m} w_{j} r_{i j} \\ \sum_{j = 1}^{m} w_{j} = 1,^{} w_{j} > 0 \end{aligned}

(4)

Specifically, if all of indicators have equal contribution to evaluate the performance of different methods, the overall rank score ors_i is computed as follows:

o r s_{i} = 1 \sum_{j = 1}^{m} r_{i j}

(5)

Among all the methods (co-expression analysis methods or module discovery methods) considered, the method with the largest overall rank score is regarded as the preferred method.

Results

In this section, to help users study miRNA sponge modules with their own datasets using miRSM, we provide a case study on inferring and analysing miRNA sponge modules in breast cancer (BRCA) as follows. In the case study, we are interested in inferring lncRNA-related miRNA sponge modules. It is noted that miRSM can be applied to infer the miRNA sponge modules involving other types of RNA (circRNA, pseudogene, etc.). For example, if we intend to use circRNA-related miRNA sponge modules, the input data when using miRSM should be matched miRNA, circRNA and mRNA expression data, and putative miRNA–target (miRNA–circRNA and miRNA–mRNA) interactions. For the case study, the BRCA dataset is used and the running R scripts can be obtained from https://github.com/zhangjunpeng411/miRSM_Supp.

Data preparation

In this case, study, miRSM requires gene expression data (matched miRNA, lncRNA and mRNA expression data) and putative miRNA–target interactions as input data for inferring miRNA sponge modules.

The matched miRNA and gene expression data of BRCA is obtained from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) [32]. The expression values of BRCA dataset are in Fragments Per Kilobase Million (FPKM) units, and are pre-processed by using ${log}_{2} (x + 1)$ . We are only interested in the BRCA expression data to 72 individuals where tumour and normal samples are from the same patients. A miRNA or gene with missing values (more than 10% of the samples) is removed, and the rest of missing values is imputed using the k-nearest neighbour method in the impute R package [33]. By using HGNC (HUGO Gene Nomenclature Committee) gene annotation platform [34], genes are annotated into lncRNAs, mRNAs, and pseudogenes, etc. Based on the lncRNA and mRNA annotation information of interest, we further extract lncRNA and mRNA expression (i.e. the expression profiles of RNA₁ and RNA₂) from gene expression data. After differential gene expression analysis by using the limma-trend approach in the limma R package [35], we identify 161 miRNAs, 364 lncRNAs and 5370 mRNAs which are differentially expressed at a significant level (adjusted p-value <1E-04, adjusted by Benjamini & Hochberg method).

We combine the interactions from NPInter v3.0 [36] and LncBase v2.0 [37] to obtain putative miRNA–lncRNA interactions. As for putative miRNA–mRNA interactions, we integrate the interactions from miRTarBase v8.0 [38], TarBase v7.0 [39] and miRWalk v2.0 [40]. In total, we have obtained a list of 30,552 unique miRNA–target interactions (including 984 miRNA–lncRNA and 29,568 miRNA–mRNA interactions) between 161 miRNAs, 364 lncRNAs and 5370 mRNAs which are differentially expressed.

We obtain the BRCA related mRNAs from DisGeNET v5.0 [30] and COSMIC [41], and the BRCA related lncRNAs from LncRNADisease v2.0 [42], Lnc2Cancer v2.0 [43] and MNDR v2.0 [44]. In total, we have collected 4819 BRCA related genes (including lncRNAs and mRNAs).

The ground truth of lncRNA-related miRNA sponge interactions is obtained by combining the interactions from three databases including miRSponge [45], LncCeRBase [46] and LncACTdb v2.0 [47]. As a result, we have collected 581 lncRNA-related miRNA sponge interactions as the ground truth for the validation.

Co-expression analysis methods display diverse performance in identifying lncRNA–mRNA co-expression modules

In Stage 1, we select seven frequently used methods from the 50 built-in (bi)clustering methods in miRSM, including WGCNA [48], GFA [49], greedy [50], MCL [51], NMF [52], k-means [53], fabia [54], to identify lncRNA-mRNA co-expression modules. Other methods, described in the package’s help documentation, can also be selected. For the greedy and MCL methods, the significant p-value cut-off of positive correlation is set to 1E-10 for the moderate size of lncRNA-mRNA co-expression network. For the methods needing to set the number of co-expression modules, the maximum number of modules to be identified is set to 72 (half of the sample size in gene expression data). Each lncRNA-mRNA co-expression module must contain at least two lncRNAs and two mRNAs.

As shown in Figure 3, the seven co-expression analysis methods identify different number and average size of lncRNA-mRNA co-expression modules, displaying their variant performance in identifying lncRNA-mRNA co-expression modules. Since all of seven co-expression analysis methods generate at least one lncRNA-mRNA co-expression module, we use the identified lncRNA-mRNA co-expression modules by the seven co-expression analysis methods for subsequent miRNA sponge module identification in the following section.

Overlap and difference between lncRNA related miRNA sponge modules identified by different module discovery methods

Based on the identified lncRNA-mRNA co-expression modules, we identify lncRNA related miRNA sponge modules (LncmiRSMs) by using the SCC, SDC and SRVC methods in Stage 2. That is, a lncRNA–mRNA co-expression modules is considered as a LncmiRSM if a group of lncRNAs and a group of mRNAs in the lncRNA-mRNA co-expression module simultaneously meet the above three criteria. By default in miRSM, a lncRNA-mRNA co-expressed module with p-value <0.05 for the hypergeometric test of common miRNA (criterion 1), Matrix Correlation (MC) more than 0.8 (criterion 2) and Sensitivity Matrix Correlation (SMC) more than 0.1 (criterion 3) are regarded as a LncmiRSM. According to the explanation of parameter selection guidelines in Section Methods, the cut-off values of MC (0.8) and SMC (0.1) are reasonable, because the identified LncmiRSMs would be all statistically significant (p-values are less than 4.01E-03 with 1E+06 datasets sampled) by using null model method.

In Figure 4, in the case of seven co-expression analysis methods, the three miRNA sponge module identification methods perform differently in identifying LncmiRSMs. Regardless of the co-expression analysis methods used, SCC and SDC can always infer a portion of identical LncmiRSMs. In the case of five co-expression analysis methods (GFA, greedy, NMF, k-means and fabia), there is an overlap of LncmiRSMs identified by SCC, SDC and SRVC. Moreover, in the case of WGCNA and MCL co-expression analysis methods, no LncmiRSMs are identified by SRVC. Since the SCC, SDC and SRVC methods use different measures to calculate $M C_{R N A_{1} - R N A_{2}}$ (criterion 2) and $S M C_{R N A_{1} - R N A_{2}}$ (criterion 3), in the case of seven co-expression analysis methods, the percentage of LncmiRSMs (the percentage of co-expression modules acting as LncmiRSMs) identified by them is also different. The above results indicate that the identified lncRNA-related miRNA sponge modules depend on both co-expression analysis methods and miRNA sponge module identification methods. The detailed information of lncRNA related miRNA sponge modules can be seen in Supplementary File S1.

Preferred methods to identify lncRNA related miRNA sponge modules

In miRSM, identifying gene co-expression modules is an important step for subsequent identification of miRNA sponge modules. Thus, given a dataset (e.g. BRCA dataset in the case study), it is necessary to select a preferred co-expression analysis method to identify gene co-expression modules in Stage 1. In this case study, we choose five indicators (number of co-expression modules, average size of co-expression modules, percentage of LncmiRSMs using SCC, percentage of LncmiRSMs using SDC and percentage of LncmiRSMs using SRVC) to evaluate the performance of seven co-expression analysis methods. We assume that the five indicators have equal contributions to evaluate the performance of different co-expression analysis methods. For each co-expression analysis method, larger number of co-expression modules (providing more candidates to identify miRNA sponge modules), higher percentage of LncmiRSMs using SCC, higher percentage of LncmiRSMs using SDC and higher percentage of LncmiRSMs using SRVC gain higher rank score. Since larger modules are often too general to obtain biological insight and guide follow-up wet experiments [10], a larger average size of co-expression modules obtains a lower rank score for each co-expression analysis method. As shown in Figure 5, the fabia method has the largest overall rank score, and thus is regarded as the preferred co-expression analysis method to identify lncRNA-mRNA co-expression modules in BRCA dataset.

Figure 5. — Rank of seven co-expression analysis methods. (A)–(E) Rank of seven co-expression analysis methods under five evaluating indicators, including number of co-expression modules, average size of co-expression modules, percentage of LncmiRSMs using *SCC*, percentage of LncmiRSMs using *SDC*, and percentage of LncmiRSMs using *SRVC*. (F) Overall rank score of seven co-expression analysis methods

Since the fabia method is the preferred co-expression analysis method in the case study, we choose it to identify lncRNA-mRNA co-expression modules in Stage 1. In Stage 2, similar to the method used in selecting preferred co-expression analysis method, we also use the overall rank score to choose a preferred method for identifying miRNA sponge modules. Here, we choose four indicators (percentage of functional LncmiRSMs, percentage of BRCA-related LncmiRSMs, number of validated miRNA sponge interactions, and percentage of significant co-expression LncmiRSMs) to evaluate the performance of four miRNA sponge module identification methods. We assume that the four indicators also have equal contributions to evaluate the performance of different miRNA sponge module identification methods.

In this work, a LncmiRSM which is significantly enriched in at least one of GO, KEGG, Reactome, DO, DisGeNET and NCG terms (significance p-value <0.05) is defined as a functional module. Functional enrichment analysis shows that the percentage of functional modules is 90.63% (29 out of 32), 97.62% (41 out of 42), and 92.68% (38 out of 41) for the SCC, SDC and SRVC methods, respectively (see details in Supplementary File S2).

Moreover, for the SCC, SDC and SRVC methods, the percentage of BRCA-related modules (significantly enriched in BRCA genes with p-value <0.05) is 40.63% (13 out of 32), 45.24% (19 out of 42) and 39.02% (16 out of 41), respectively. By using the limited ground truth of lncRNA-related miRNA sponge interactions, we have discovered that 11, 18 and 14 LncmiRSMs identified by the SCC, SDC and SRVC methods, respectively, each contains at least one experimentally validated lncRNA-related miRNA sponge interaction. In total, for the SCC, SDC and SRVC methods, the numbers of experimentally validated lncRNA-related miRNA sponge interactions are 24, 39 and 25, respectively (see details in Supplementary File S3).

Here, if the overall co-expression level of a LncmiRSM is significantly higher than that of 1000 corresponding random modules in terms of the mean value of the absolute correlation between sponge lncRNAs and mRNAs, the LncmiRSM is regarded as a significant co-expression module. As a result, for the SCC, SDC and SRVC methods, the percentage of significant co-expression LncmiRSMs is 62.50% (20 out of 32), 80.95% (34 out of 42), and 60.98% (25 out of 41) respectively (see details in Supplementary File S4).

For each miRNA sponge module identification method, higher percentage of functional LncmiRSMs, higher percentage of BRCA-related LncmiRSMs, larger number of experimentally validated miRNA sponge interactions, and higher percentage of significant co-expression LncmiRSMs obtain a higher rank score. As a result, for the SCC, SDC and SRVC methods, the overall rank score is 1.5, 3, and 1.5, respectively. Therefore, the best performer SDC method with the largest overall rank score is considered as a preferred method for identifying LncmiRSMs in the BRCA dataset.

In summary, for BRCA dataset, the preferred method selected for co-expression analysis in Stage 1 and module discovery in Stage 2 to infer lncRNA related miRNA sponge modules is fabia and SDC, respectively.

Modular analysis of lncRNA related miRNA sponge modules

According to the evaluation in the above section, the fabia method in Stage 1 and the SDC method in Stage 2 are chosen to identify LncmiRSMs in this case study. As a result, we have identified 42 LncmiRSMs in total. In Stage 3, to investigate the identified LncmiRSMs, we further conduct modular analysis of them. Functional enrichment analysis shows that the percentage of functional modules is 97.62% (41 out of 42), indicating that most of LncmiRSMs identified are biologically meaningful. BRCA enrichment analysis indicates that the percentage of BRCA-related modules (significantly enriched in BRCA genes with p-value <0.05) is 45.24% (19 out of 42). This result implies that several LncmiRSMs are closely associated with BRCA. Validation analysis shows that the number of experimentally validated lncRNA-related miRNA sponge interactions included in 18 LncmiRSMs is 39.

Pair-wise co-expression analysis indicates that the overall co-expression levels of 80.95% (34 out of 42) LncmiRSMs, are significantly higher than those of 1000 corresponding random modules in terms of the mean value of absolute correlation between sponge lncRNAs and mRNAs (significance p-value <0.05). These results indicate that the sponge lncRNA–mRNA pairs in most of LncmiRSMs are statically co-expressed with each other.

The miRNA distribution analysis has revealed that 97.89% (93 out of 95) miRNAs mediates at least two LncmiRSMs (see details in Supplementary File S5). This result implies that most of the mediating miRNAs act as crosslinks across different LncmiRSMs.

Based on the identified LncmiRSMs and their common miRNAs, we have predicted 299,664 unique miRNA–target interactions. Moreover, we have identified 394,765 unique lncRNA-related miRNA sponge interactions. The predicted miRNA–target interactions and lncRNA-related miRNA sponge interactions can be seen from https://github.com/zhangjunpeng411/miRSM_Supp.

All of lncRNA related miRNA sponge modules are potential biomarkers

In this section, we further evaluate whether the identified 42 LncmiRSMs (the fabia method in Stage 1 and the SDC method in Stage 2 are chosen in the case study) could act as potential biomarkers to classify tumour and normal samples or not. In order to understand the classification performance of the genes in each LncmiRSM, we use the Support Vector Machine (SVM) classifier [55] with default parameters implemented in the e1071 R package [56]. Moreover, we use the area under receiver operating characteristic curve (AUC), and make 10-fold cross-validation to evaluate the performance of each LncmiRSM. AUC denotes the probability that ‘positive’ sample ranks higher than ‘negative’ one by using SVM. In this work, the LncmiRSMs with high values of AUC (e.g. more than 0.99) are regarded as potential biomarkers. As a result, all of the identified 42 LncmiRSMs perform well in classifying tumour and normal samples, and regarded as potential biomarkers (see details in Supplementary File S6).

Comparison with graph-based clustering methods

Previously, the community detection or graph clustering methods can also be used to identify miRNA sponge modules from the constructed miRNA sponge interaction network. Here, these module discovery methods for discovering miRNA sponge modules are called Graph-based Clustering (GC) methods. For the GC methods, we first choose two popular methods, including Sensitivity Correlation (SC) [57] and Sparse Partial correlation ON Gene Expression (SPONGE) [24], to infer miRNA sponge interaction networks. For the SC method, a lncRNA–mRNA pair with p-value <0.05 for the hypergeometric test of common miRNA, p-value <0.05 for positive correlation and sensitivity correlation more than 0.1 is regarded as a miRNA sponge interaction pair. For the SPONGE method, we use the default parameter settings in the SPONGE R package [24] to predict miRNA sponge interaction pairs. All the identified miRNA sponge interaction pairs are combined into a miRNA sponge interaction network. Next, we use four graph clustering methods consisting of FN (also called fast greedy network algorithm) [50], MCL [51], MCODE [58] and LINKCOMM [59] implemented in the miRspongeR R package [11], to predict miRNA sponge modules from miRNA sponge interaction networks identified by both SC and SPONGE methods. Similarly, each identified module should contain at least two sponge lncRNAs and two mRNAs.

For comparison, in miRSM, we select the fabia method (the best performer in identifying co-expression modules) in Stage 1 and the SCC, SDC and SRVC methods in Stage 2 to identify miRNA sponge modules. We compare miRSM and GC in terms of five aspects, including percentage of functional modules, percentage of BRCA-related modules, percentage of significant co-expression modules, percentage of potential biomarkers, number of validated miRNA sponge interactions (see Figure 6A). In addition, we also use the overall rank score to evaluate the performance of each method. As shown in Figure 6B, the comparison result displays that all the methods in miRSM always outperforms the GC methods. The detailed results of the GC methods can be found in Supplementary File S7.

Figure 6. — Comparison with traditional graph-based clustering methods. (A) Comparison in terms of five aspects, including percentage of functional modules, percentage of BRCA-related modules, percentage of significant co-expression LncmiRSMs, percentage of potential biomarkers, number of validated miRNA sponge interactions. (B) Overall rank score of module discovery methods in *miRSM* and GC

Conclusions and discussions

miRSM has been released under the GPL-3.0 Licence, and is available at http://bioconductor.org/packages/miRSM/, and https://github.com/zhangjunpeng411/miRSM. The user manual of miRSM provides examples illustrating the use of the utility functions at https://bioconductor.org/packages/release/bioc/vignettes/miRSM/inst/doc/miRSM.html. For user’s convenience, we set the default methods in the utility functions as the frequently used methods to identify gene co-expression modules and miRNA sponge modules.

In the form of modules, miRNA sponges concert biological functions and are highly implicated in human physiological and pathological processes. In this work, we introduce the miRSM R package for inferring and analysing miRNA sponge modules by integrating expression data and miRNA–target interactions. The comparison result has shown that miRSM performs better than the Graph-based Clustering (GC) methods in the case study, indicating that it is desirable to infer miRNA sponge modules from heterogeneous data. Moreover, miRSM is a comprehensive tool including 50 co-expression analysis methods for identifying gene co-expression modules, four module discovery methods to identify miRNA sponge modules, and seven modular analysis methods to investigate miRNA sponge modules. In addition, we also provide a parameter selection guideline to identify miRNA sponge modules when applying new datasets. Finally, to help users to select the preferred methods for their own datasets, we propose an overall rank scores to conduct a comprehensive evaluation of co-expression analysis methods and module discovery methods.

The previous tool miRspongeR [11] mainly uses GC methods to infer miRNA sponge modules based on the identified miRNA sponge interaction network. Different from miRspongeR, miRSM directly infers miRNA sponge modules from heterogeneous data (including multiple expression data and putative miRNA–target interactions). Specifically, when using miRSM for predicting miRNA sponge modules, there are three main advantages compared with miRspongeR. Firstly, miRSM provides gene co-expression analysis, which is conducive to discovering collaborative or synergistic sponge RNAs and target RNAs for modular competition. Second, by calculating matrix correlation (MC) as competition strength between sponge RNAs and target RNAs, miRSM could investigate the competitive effects of miRNA sponges on biological conditions. Third, by computing sensitivity matrix correlation (SMC) to evaluate the influence of sharing miRNAs on the sponge RNAs and target RNAs, miRSM could help to understand the miRNA sponging activities in biological conditions.

From the standpoint of computational biology, miRSM could help users understand the modular mechanism of miRNA sponges in complex human diseases. In the future, we will continuously incorporate new methods into the package to infer miRNA sponge modules. We believe that miRSM can be an effective tool for the research of miRNA sponge modules in human cancer.

Supplementary Material

Supplemental Material

Click here for additional data file.^{(172.4KB, zip)}

Acknowledgments

We are grateful to the Bioconductor Project, for the valuable comments on the codes to greatly improve the miRSM package.

Funding Statement

This work was supported by the National Natural Science Foundation of China under Grant (No. 61963001, No. 61702069, No. 61872405, No. 61720106004, and No. 61902372); Yunnan Fundamental Research Projects under Grant (No. 202001AT070024); National Health & Medical Research Council (NHMRC) under Grant (No. 1123042); and Australian Research Council Discovery under Grant (No. DP170101306); National Natural Science Foundation of China [61963001, 61702069]; National Natural Science Foundation of China [61872405]; National Natural Science Foundation of China [61720106004, 61902372]; Yunnan Fundamental Research Projects [202001AT070024]; National Health & Medical Research Council (NHMRC) Grant [1123042]; Australian Research Council Discovery Grant [DP170101306].

Disclosure statement

The authors have declared that no competing interests exist.

Supplementary material

Supplemental data for this article can be accessed here.

References

[1].Ambros V. The functions of animal microRNAs. Nature. 2004;431(7006):350–355. [DOI] [PubMed] [Google Scholar]
[2].Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–297. [DOI] [PubMed] [Google Scholar]
[3].Lin S, Gregory RI. MicroRNA biogenesis pathways in cancer. Nat Rev Cancer. 2015;15(6):321–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Chen X, Xie D, Zhao Q, et al. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20(2):515–539. [DOI] [PubMed] [Google Scholar]
[5].Salmena L, Poliseno L, Tay Y, et al. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell. 2011;146(3):353–358. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Tay Y, Rinn J, Pandolfi PP. The multilayered complexity of ceRNA crosstalk and competition. Nature. 2014;505(7483):344–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Le TD, Zhang J, Liu L, et al. Computational methods for identifying miRNA sponge interactions. Brief Bioinform. 2017;18(4):577–590. [DOI] [PubMed] [Google Scholar]
[8].Qi X, Lin Y, Chen J, et al. Decoding competing endogenous RNA networks for cancer biomarker discovery. Brief Bioinform. 2020;21(2):441–457. [DOI] [PubMed] [Google Scholar]
[9].Poliseno L, Salmena L, Zhang J, et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010;465(7301):1033–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Choobdar S, Ahsen ME, Crawford J, et al. Assessment of network module identification across complex diseases. Nat Methods. 2019;16(9):843–852. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Zhang J, Liu L, Xu T, et al. miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules. BMC Bioinformatics. 2019;20(1):235. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Zhang J, Xu T, Liu L, et al. LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer. PLoS Comput Biol. 2020;16(4):e1007851. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Bouzeghoub M, Lóscio BF, Kedad Z, et al. Heterogeneous data source integration and evolution. International Conference on Database and Expert Systems Applications. Springer, Berlin, Heidelberg, 2002: 751–757. [Google Scholar]
[14].Zhang J, Le TD, Liu L, et al. Identifying miRNA sponge modules using biclustering and regulatory scores. BMC Bioinformatics. 2017;18(Suppl S3):44. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Deng J, Kong W, Wang S, et al. Prior knowledge driven joint NMF algorithm for ceRNA co-module identification. International Journal of Biological Sciences. 2018;14(13):1822–1833. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Xiao Q, Luo J, Liang C, et al. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics. 2019;20(1):67. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Wen X, Gao L, Hu Y. LAceModule: identification of competing endogenous RNA modules by integrating dynamic correlation. Frontiers in Genetics. 2020;11:235. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Jiang D, Tang C, Zhang A. Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering. 2004;16(11):1370–1386. [Google Scholar]
[19].Xie J, Ma A, Fennell A, et al. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform. 2019;20(4):1449–1464. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Csardi G, Nepusz T. The igraph software package for complex network research. Int J Complex Syst. 2006;1695:1–9. [Google Scholar]
[21].Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10(3):515–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2008;35(6):2769–2794. [Google Scholar]
[23].Robert P, Escoufier Y. A unifying tool for linear multivariate statistical methods: the RV- coefficient. Appl Stat. 1976;25(3):257–265. [Google Scholar]
[24].List M, Dehghani Amirabad A, Kostka D, et al. Large-scale inference of competing endogenous RNA networks with sparse partial correlation. Bioinformatics. 2019;35(14):i596–i604. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Shao T, Wu A, Chen J, et al. Identification of module biomarkers from the dysregulated ceRNA–ceRNA interaction network in lung adenocarcinoma. Mol Biosyst. 2015;11(11):3048–3058. [DOI] [PubMed] [Google Scholar]
[26].Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nat Genet. The Gene Ontology Consortium. 2000;25(1):25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Kanehisa M. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Jassal B, Matthews L, Viteri G, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48(D1):D498–D503. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Schriml LM, Mitraka E, Munro J, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955–D962. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–D855. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Repana D, Nulsen J, Dressler L, et al. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019;20(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nature Genetics. 2013;45(10):1113–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Hastie T, Tibshirani R, Narasimhan B, et al. Impute: imputation for microarray data. R package version 1.64.0. 2020. DOI: 10.18129/B9.bioc.impute [DOI]
[34].Tweedie S, Braschi B, Gray K, et al. Genenames.org: the HGNC and VGNC resources in 2021. Nucleic Acids Res. 2020;gkaa980. DOI: 10.1093/nar/gkaa980. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Hao Y, Wu W, Li H, et al. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford). 2016;2016:baw057. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Paraskevopoulou MD, Vlachos IS, Karagkouni D, et al. DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res. 2016;44(D1):D231–D238. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Chou C-H, Shrestha S, Yang C-D, et al. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 2017;18(S3):D296–D302. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Vlachos IS, Paraskevopoulou MD, Karagkouni D, et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 2015;43(D1):D153–D159. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Dweep H, Gretz N. miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nat Methods. 2015;12(8):697. [DOI] [PubMed] [Google Scholar]
[41].Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45(D1):D777–D783. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Bao Z, Yang Z, Huang Z, et al. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019;47(D1):D1034–D1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Gao Y, Wang P, Wang Y, et al. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res. 2019;47(D1):D1028–D1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Cui T, Zhang L, Huang Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res. 2018;46(D1):D371–D374. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45].Wang P, Zhi H, Zhang Y, et al. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs. Database (Oxford). 2015;2015:bav098. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Pian C, Zhang G, Tu T, et al. LncCeRBase: a database of experimentally validated human competing endogenous long non-coding RNAs. Database (Oxford). 2018;2018:bay061. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Wang P, Li X, Gao Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res. 2019;47(D1):D121–D127. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Bunte K, Leppäaho E, Saarinen I, et al. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics. 2016;32(16):2457–2463. [DOI] [PubMed] [Google Scholar]
[50].Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;70(6):066111. [DOI] [PubMed] [Google Scholar]
[51].Enright AJ. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2019;47(D1):1575–1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 2010;11(1):367. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53].Forgy EW. Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics. 1965;21(3):768–769. [Google Scholar]
[54].Hochreiter S, Bodenhofer U, Heusel M, et al. FABIA: factor analysis for bicluster acquisition. Bioinformatics. 2010;26(12):1520–1527. [DOI] [PMC free article] [PubMed] [Google Scholar]
[55].Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM T Intel Syst Tec. 2011;2(3):27. [Google Scholar]
[56].Meyer D, Dimitriadou E, Hornik K, et al. Functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien, R package version 1.7-4. 2020. [accessed 21 Aug 2020]. Available from: https://CRAN.R-project.org/package=e1071.
[57].Paci P, Colombo T, Farina L. Computational analysis identifies a sponge interaction network between long non-coding RNAs and messenger RNAs in human breast cancer. BMC Syst Biol. 2020;16(1):83. [DOI] [PMC free article] [PubMed] [Google Scholar]
[58].Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4(1):2. [DOI] [PMC free article] [PubMed] [Google Scholar]
[59].Kalinka AT, Tomancak P. linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type. Bioinformatics. 2011;27(14):2011–2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Click here for additional data file.^{(172.4KB, zip)}

[cit0001] [1].Ambros V. The functions of animal microRNAs. Nature. 2004;431(7006):350–355. [DOI] [PubMed] [Google Scholar]

[cit0002] [2].Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116(2):281–297. [DOI] [PubMed] [Google Scholar]

[cit0003] [3].Lin S, Gregory RI. MicroRNA biogenesis pathways in cancer. Nat Rev Cancer. 2015;15(6):321–333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0004] [4].Chen X, Xie D, Zhao Q, et al. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20(2):515–539. [DOI] [PubMed] [Google Scholar]

[cit0005] [5].Salmena L, Poliseno L, Tay Y, et al. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell. 2011;146(3):353–358. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0006] [6].Tay Y, Rinn J, Pandolfi PP. The multilayered complexity of ceRNA crosstalk and competition. Nature. 2014;505(7483):344–352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0007] [7].Le TD, Zhang J, Liu L, et al. Computational methods for identifying miRNA sponge interactions. Brief Bioinform. 2017;18(4):577–590. [DOI] [PubMed] [Google Scholar]

[cit0008] [8].Qi X, Lin Y, Chen J, et al. Decoding competing endogenous RNA networks for cancer biomarker discovery. Brief Bioinform. 2020;21(2):441–457. [DOI] [PubMed] [Google Scholar]

[cit0009] [9].Poliseno L, Salmena L, Zhang J, et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2010;465(7301):1033–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0010] [10].Choobdar S, Ahsen ME, Crawford J, et al. Assessment of network module identification across complex diseases. Nat Methods. 2019;16(9):843–852. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0011] [11].Zhang J, Liu L, Xu T, et al. miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules. BMC Bioinformatics. 2019;20(1):235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0012] [12].Zhang J, Xu T, Liu L, et al. LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer. PLoS Comput Biol. 2020;16(4):e1007851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0013] [13].Bouzeghoub M, Lóscio BF, Kedad Z, et al. Heterogeneous data source integration and evolution. International Conference on Database and Expert Systems Applications. Springer, Berlin, Heidelberg, 2002: 751–757. [Google Scholar]

[cit0014] [14].Zhang J, Le TD, Liu L, et al. Identifying miRNA sponge modules using biclustering and regulatory scores. BMC Bioinformatics. 2017;18(Suppl S3):44. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0015] [15].Deng J, Kong W, Wang S, et al. Prior knowledge driven joint NMF algorithm for ceRNA co-module identification. International Journal of Biological Sciences. 2018;14(13):1822–1833. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0016] [16].Xiao Q, Luo J, Liang C, et al. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics. 2019;20(1):67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0017] [17].Wen X, Gao L, Hu Y. LAceModule: identification of competing endogenous RNA modules by integrating dynamic correlation. Frontiers in Genetics. 2020;11:235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0018] [18].Jiang D, Tang C, Zhang A. Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering. 2004;16(11):1370–1386. [Google Scholar]

[cit0019] [19].Xie J, Ma A, Fennell A, et al. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform. 2019;20(4):1449–1464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0020] [20].Csardi G, Nepusz T. The igraph software package for complex network research. Int J Complex Syst. 2006;1695:1–9. [Google Scholar]

[cit0021] [21].Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10(3):515–534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0022] [22].Székely GJ, Rizzo ML, Bakirov NK. Measuring and testing dependence by correlation of distances. Ann Stat. 2008;35(6):2769–2794. [Google Scholar]

[cit0023] [23].Robert P, Escoufier Y. A unifying tool for linear multivariate statistical methods: the RV- coefficient. Appl Stat. 1976;25(3):257–265. [Google Scholar]

[cit0024] [24].List M, Dehghani Amirabad A, Kostka D, et al. Large-scale inference of competing endogenous RNA networks with sparse partial correlation. Bioinformatics. 2019;35(14):i596–i604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0025] [25].Shao T, Wu A, Chen J, et al. Identification of module biomarkers from the dysregulated ceRNA–ceRNA interaction network in lung adenocarcinoma. Mol Biosyst. 2015;11(11):3048–3058. [DOI] [PubMed] [Google Scholar]

[cit0026] [26].Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nat Genet. The Gene Ontology Consortium. 2000;25(1):25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0027] [27].Kanehisa M. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0028] [28].Jassal B, Matthews L, Viteri G, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020;48(D1):D498–D503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0029] [29].Schriml LM, Mitraka E, Munro J, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47(D1):D955–D962. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0030] [30].Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–D855. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0031] [31].Repana D, Nulsen J, Dressler L, et al. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019;20(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0032] [32].Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nature Genetics. 2013;45(10):1113–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0033] [33].Hastie T, Tibshirani R, Narasimhan B, et al. Impute: imputation for microarray data. R package version 1.64.0. 2020. DOI: 10.18129/B9.bioc.impute [DOI]

[cit0034] [34].Tweedie S, Braschi B, Gray K, et al. Genenames.org: the HGNC and VGNC resources in 2021. Nucleic Acids Res. 2020;gkaa980. DOI: 10.1093/nar/gkaa980. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0035] [35].Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0036] [36].Hao Y, Wu W, Li H, et al. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford). 2016;2016:baw057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0037] [37].Paraskevopoulou MD, Vlachos IS, Karagkouni D, et al. DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res. 2016;44(D1):D231–D238. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0038] [38].Chou C-H, Shrestha S, Yang C-D, et al. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 2017;18(S3):D296–D302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0039] [39].Vlachos IS, Paraskevopoulou MD, Karagkouni D, et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 2015;43(D1):D153–D159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0040] [40].Dweep H, Gretz N. miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nat Methods. 2015;12(8):697. [DOI] [PubMed] [Google Scholar]

[cit0041] [41].Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017;45(D1):D777–D783. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0042] [42].Bao Z, Yang Z, Huang Z, et al. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2019;47(D1):D1034–D1037. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0043] [43].Gao Y, Wang P, Wang Y, et al. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res. 2019;47(D1):D1028–D1033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0044] [44].Cui T, Zhang L, Huang Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res. 2018;46(D1):D371–D374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0045] [45].Wang P, Zhi H, Zhang Y, et al. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs. Database (Oxford). 2015;2015:bav098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0046] [46].Pian C, Zhang G, Tu T, et al. LncCeRBase: a database of experimentally validated human competing endogenous long non-coding RNAs. Database (Oxford). 2018;2018:bay061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0047] [47].Wang P, Li X, Gao Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res. 2019;47(D1):D121–D127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0048] [48].Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9(1):559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0049] [49].Bunte K, Leppäaho E, Saarinen I, et al. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics. 2016;32(16):2457–2463. [DOI] [PubMed] [Google Scholar]

[cit0050] [50].Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;70(6):066111. [DOI] [PubMed] [Google Scholar]

[cit0051] [51].Enright AJ. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2019;47(D1):1575–1584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0052] [52].Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 2010;11(1):367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0053] [53].Forgy EW. Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics. 1965;21(3):768–769. [Google Scholar]

[cit0054] [54].Hochreiter S, Bodenhofer U, Heusel M, et al. FABIA: factor analysis for bicluster acquisition. Bioinformatics. 2010;26(12):1520–1527. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0055] [55].Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM T Intel Syst Tec. 2011;2(3):27. [Google Scholar]

[cit0056] [56].Meyer D, Dimitriadou E, Hornik K, et al. Functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien, R package version 1.7-4. 2020. [accessed 21 Aug 2020]. Available from: https://CRAN.R-project.org/package=e1071.

[cit0057] [57].Paci P, Colombo T, Farina L. Computational analysis identifies a sponge interaction network between long non-coding RNAs and messenger RNAs in human breast cancer. BMC Syst Biol. 2020;16(1):83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0058] [58].Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4(1):2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0059] [59].Kalinka AT, Tomancak P. linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type. Bioinformatics. 2011;27(14):2011–2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

Junpeng Zhang

Lin Liu

Taosheng Xu

Wu Zhang

Chunwen Zhao

Sijing Li

Jiuyong Li

Nini Rao

Thuc Duy Le

ABSTRACT

Introduction

Methods

Design and implementation

Figure 1.

Co-expression analysis methods for inferring RNA1-RNA2 co-expression modules

Figure 2.

Module discovery methods for inferring miRNA sponge modules

Parameter selection guideline for identifying miRNA sponge modules

Modular analysis of miRNA sponge modules

Functional enrichment analysis

Cancer enrichment analysis

Validation analysis

Pair-wise co-expression analysis

miRNA distribution analysis

miRNA target prediction

Identification of miRNA sponge interactions

Evaluation of methods to infer miRNA sponge modules

Results

Data preparation

Co-expression analysis methods display diverse performance in identifying lncRNA–mRNA co-expression modules

Figure 3.

Overlap and difference between lncRNA related miRNA sponge modules identified by different module discovery methods

Figure 4.

Preferred methods to identify lncRNA related miRNA sponge modules

Figure 5.

Modular analysis of lncRNA related miRNA sponge modules

All of lncRNA related miRNA sponge modules are potential biomarkers

Comparison with graph-based clustering methods

Figure 6.

Conclusions and discussions

Supplementary Material

Acknowledgments

Funding Statement

Disclosure statement

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Co-expression analysis methods for inferring RNA₁-RNA₂ co-expression modules