LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Junpeng Zhang; Taosheng Xu; Lin Liu; Wu Zhang; Chunwen Zhao; Sijing Li; Jiuyong Li; Nini Rao; Thuc Duy Le

doi:10.1371/journal.pcbi.1007851

. 2020 Apr 23;16(4):e1007851. doi: 10.1371/journal.pcbi.1007851

LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Junpeng Zhang ^1,^2,^‡,^*, Taosheng Xu ^3,^‡, Lin Liu ⁴, Wu Zhang ⁵, Chunwen Zhao ², Sijing Li ², Jiuyong Li ⁴, Nini Rao ^1,^*, Thuc Duy Le ^4,^*

Editor: Teresa M Przytycka⁶

PMCID: PMC7200020 PMID: 32324747

Abstract

Until now, existing methods for identifying lncRNA related miRNA sponge modules mainly rely on lncRNA related miRNA sponge interaction networks, which may not provide a full picture of miRNA sponging activities in biological conditions. Hence there is a strong need of new computational methods to identify lncRNA related miRNA sponge modules. In this work, we propose a framework, LMSM, to identify LncRNA related MiRNA Sponge Modules from heterogeneous data. To understand the miRNA sponging activities in biological conditions, LMSM uses gene expression data to evaluate the influence of the shared miRNAs on the clustered sponge lncRNAs and mRNAs. We have applied LMSM to the human breast cancer (BRCA) dataset from The Cancer Genome Atlas (TCGA). As a result, we have found that the majority of LMSM modules are significantly implicated in BRCA and most of them are BRCA subtype-specific. Most of the mediating miRNAs act as crosslinks across different LMSM modules, and all of LMSM modules are statistically significant. Multi-label classification analysis shows that the performance of LMSM modules is significantly higher than baseline’s performance, indicating the biological meanings of LMSM modules in classifying BRCA subtypes. The consistent results suggest that LMSM is robust in identifying lncRNA related miRNA sponge modules. Moreover, LMSM can be used to predict miRNA targets. Finally, LMSM outperforms a graph clustering-based strategy in identifying BRCA-related modules. Altogether, our study shows that LMSM is a promising method to investigate modular regulatory mechanism of sponge lncRNAs from heterogeneous data.

Author summary

Previous studies have revealed that long non-coding RNAs (lncRNAs), as microRNA (miRNA) sponges or competing endogenous RNAs (ceRNAs), can regulate the expression levels of messenger RNAs (mRNAs) by decreasing the amount of miRNAs interacting with mRNAs. In this work, we hypothesize that the “tug-of-war” between RNA transcripts for attracting miRNAs is across groups or modules. Based on the hypothesis, we propose a framework called LMSM, to identify LncRNA related MiRNA Sponge Modules. Based on the two miRNA sponge modular competition principles, significant sharing of miRNAs and high canonical correlation between the sponge lncRNAs and mRNAs, LMSM is also capable of predicting miRNA targets. LMSM not only extends the ceRNA hypothesis, but also provides a novel way to investigate the biological functions and modular mechanism of lncRNAs in breast cancer.

Introduction

Long non-coding RNAs (lncRNAs) are RNA transcripts with more than 200 nucleotides (nts) in length [1]. More and more evidence has shown that lncRNAs play important functional roles in many biological processes, including human cancers [2–4]. As a major class of non-coding RNAs (ncRNAs), lncRNAs have attracted increasing interest from researchers in their exploration of non-coding knowledge from the ‘junk’.

Among the wide range of biological functions of lncRNAs, their role as competing endogenous RNAs (ceRNAs) or miRNA sponges is in the limelight. As a family of small ncRNAs (~18nts in length), miRNAs are important post-transcriptional regulators of gene expression [5,6]. According to the ceRNA hypothesis [7], lncRNAs contain abundant miRNA response elements (MREs) for competitively sequestering target mRNAs from miRNAs’ control. This regulation mechanism of lncRNAs when acting as miRNA sponges is highly implicated in various human diseases [8], including breast cancer [9]. For example, lncRNA H19, an imprinted gene is associated with breast cancer cell clonogenicity, migration and mammosphere-forming ability. By sponging miRNA let-7, H19 forms a H19/let-7/LIN28 reciprocal negative regulatory circuit to play a critical role in the breast cancer stem cell maintenance [10].

To systematically investigate the functions of lncRNAs as miRNA sponges in human cancer, a series of computational methods have been developed to infer lncRNA related miRNA sponge interaction networks. The methods can be divided into three categories according to the statistical or computational techniques employed: pair-wise correlation based approach, partial association based approach, and mathematical modelling approach [11].

It is commonly known that to implement a specific biological function, genes tend to cluster or connect in the form of modules or communities. Consequently, based on the identified lncRNA related miRNA sponge interaction networks, several methods [12–17] using graph clustering algorithms were developed to identify lncRNA related miRNA sponge modules. For the identification of sponge lncRNA-mRNA pairs, most of existing methods only consider pair-wise correlation of them. Since the lncRNA related miRNA sponge interaction networks are created by simply putting together sponge lncRNA-mRNA pairs, when the expression levels of each sponge lncRNA-mRNA pair are highly correlated, the collective correlation between the set of sponge lncRNAs and the set of mRNAs in the same identified module is not necessarily high. As we know, the pair-wise positive correlation between the expression levels of a lncRNA and a mRNA pair is commonly used to identify the sponge interactions between them. For the identification of lncRNA related miRNA sponge modules, it is also necessary to investigate whether the clustered sponge lncRNAs and mRNAs in a module have high collective positive correlation or not. Moreover, these methods do not consider the influence of the shared miRNAs on the expression of the clustered sponge lncRNAs and mRNAs. It is known that the “tug-of-war” between sponge lncRNAs and mRNAs is mediated by miRNAs. Therefore, it is extremely important to consider the influence of the shared miRNAs in identifying lncRNA related miRNA sponge modules.

Recently, to study lncRNA, miRNA and mRNA-associated regulatory modules, Deng et al. [18] and Xiao et al. [19] have proposed two types of joint matrix factorization methods to identify mRNA-miRNA-lncRNA co-modules by integrating gene expression data and putative miRNA-target interactions. However, it is still not clear how the shared miRNAs influence the expression level of the sponge lncRNAs and mRNAs in a module.

To address the above issues, we firstly hypothesize that sponge lncRNAs form a group to competitively release a group of target mRNAs from the control of the miRNAs shared by the lncRNAs and mRNAs (details see Section Materials and methods). We name this hypothesis the miRNA sponge modular competition hypothesis in this paper. Then based on the hypothesis, we propose a novel framework to identify LncRNA related MiRNA Sponge Modules (LMSM). The framework firstly uses the WeiGhted Correlation Network Analysis (WGCNA) [20] method to generate lncRNA-mRNA co-expression modules. Next, by incorporating matched miRNA expression and putative miRNA-target interactions, LMSM applies three constraints (see Section Materials and methods) to obtain lncRNA related miRNA sponge modules (also called LMSM modules in this paper). One of the constraints, high canonical correlation, is used to assess whether the group of sponge lncRNAs and the group of mRNAs in the same module have a high collective positive correlation or not. The other constraint, adequate sensitivity canonical correlation conditioning on a group of miRNAs, is used to evaluate the influence of the shared miRNAs on the clustered sponge lncRNAs and mRNAs.

To evaluate the LMSM approach, we apply it to matched miRNA, lncRNA and mRNA expression data, and clinical information of breast cancer (BRCA) dataset from The Cancer Genome Atlas (TCGA). The modular analysis results demonstrate that LMSM can help to uncover modular regulatory mechanism of sponge lncRNAs in BRCA. LMSM is released under the GPL-3.0 License, and is freely available through GitHub repository (https://github.com/zhangjunpeng411/LMSM).

Materials and methods

A hypothesis on miRNA sponge modular competition

The ceRNA hypothesis [7] indicates that a pool of RNA transcripts (known as ceRNAs) regulate each other’s transcripts by competing for the shared miRNAs through MREs. Based on this unifying hypothesis, a large-scale gene regulatory network including coding and non-coding RNAs across the transcriptome can be formed, and it plays critical roles in human physiological and pathological processes. However, by using MREs as letters of language, the hypothesis only depicts the crosstalk between individual RNA transcript (e.g. coding RNAs, lncRNAs, circRNAs or pseudogenes) and mRNA at the pair-wise interaction level and the crosstalk between RNA transcripts and mRNAs at the module level is still an open question.

There has been evidence showing that for the same transcriptional regulatory program, biological process or signaling pathway, genes tend to form modules or communities to coordinate biological functions [21]. These modules correspond to functional units in complex biological systems, and they play important role in gene regulation. Based on these findings, in this paper, we hypothesize that regarding miRNA sponging, the crosstalk between different RNA transcripts is in the form of modular competition. We call the hypothesis the miRNA sponge modular competition hypothesis.

As shown in Fig 1, based on our hypothesis, instead of having pair-wise competitions, miRNA sponges form groups to compete at module level for common miRNAs. Here, a miRNA sponge module consists of a competing group (other coding RNA group, pseudogene group, circRNA group or lncRNA group) and a mRNA group. Here, other coding RNAs also include mRNAs. From the perspective of modularity, the hypothesis at module level extends the ceRNA hypothesis and provides a new channel to look into the functions and regulatory mechanism of miRNA sponges or ceRNAs. Since the available resources of lncRNAs are more abundant than those of other non-coding RNAs (e.g. circRNAs and pseudogenes), in this paper, we focus on the competition between lncRNAs and mRNAs to validate and demonstrate the proposed miRNA sponge modular competition hypothesis. Our goal is to discover lncRNA related sponge modules, or LMSM modules. Here each LMSM module contains a group of lncRNAs which compete collectively with a group of mRNAs for sponging the same set of miRNAs.

The LMSM framework

Overview of LMSM

As shown in Fig 2, the proposed LMSM framework comprises two stages. In stage 1, the WGCNA method [20] is used for finding lncRNA-mRNA co-expression modules from matched lncRNA and mRNA expression data. Then in stage 2, LMSM identifies LMSM modules from the lncRNA-mRNA co-expression modules using three criteria. That is, a co-expression module is considered as a LMSM module if the group of lncRNAs and the group of mRNAs in the co-expression module: (1) have significant sharing of miRNAs, (2) have high canonical correlation between their expression levels, and (3) have adequate sensitivity canonical correlation conditioning on their shared miRNAs. LMSM checks the criteria one by one, and once a co-expression module does not meet a criterion, it is discarded and will not be checked for the next criterion. In the following, we will describe the two stages in detail.

Identifying lncRNA-mRNA co-expression modules

For identifying lncRNA-mRNA co-expression modules, we use the WGCNA method. WGCNA is a popular method for identifying co-expressed genes across samples and it can be used to identify clusters of highly co-expressed lncRNAs and mRNAs. In our task, we use the matched lncRNA and mRNA expression data as input to the WGCNA R package [20] to identify lncRNA-mRNA co-expression modules. We use the scale-free topology criterion for soft thresholding. The coefficient of determination R² (the range is from 0 to 1) is used to quantify the goodness of scale-free topology, and larger R² values mean better scale-free topology. Normally, the R² value larger than 0.8 in power law curve fit is ranked as good-level in the WGCNA method. Therefore, the desired minimum scale free topology fitting index R² is set as 0.8 in this work.

Inferring lncRNA related miRNA sponge modules

To identify lncRNA related miRNA sponge modules from the co-expression modules obtained in stage 1, we propose three criteria (detailed below) by following the key tenet of our miRNA sponge modular competition hypothesis. That is, a group of lncRNAs (acting as miRNA sponges) competes with a group of mRNAs with respect to a set of miRNAs shared by the two groups.

The first criterion requires that the group of lncRNAs and the group of mRNAs in a miRNA sponge module have a significant sharing of a set of miRNAs. LMSM uses a hypergeometric test to assess the significance of the sharing of miRNAs between the group of lncRNAs and the group of mRNAs in a co-expression module, based on putative miRNA-target interactions. The p-value for the test is computed as:

p - v a l u e = 1 - \sum_{i_{1} = 0}^{L_{1} - 1} \frac{(\begin{array}{l} M_{1} \\ i_{1} \end{array}) (\begin{array}{l} N_{1} - M_{1} \\ K_{1} - i_{1} \end{array})}{(\begin{array}{l} N_{1} \\ K_{1} \end{array})}

In the equation, N₁ is the number of all miRNAs in the dataset, M₁ and K₁ denote the total numbers of miRNAs interacting with the group of lncRNAs and the group of mRNAs in the co-expression module respectively, and L₁ (e.g. 3) is the number of miRNAs shared by the group of lncRNAs and the group of mRNAs in the co-expression module.

The second criterion is to assure that the sponge modular competition between the group of lncRNAs and the group of mRNAs in a miRNA sponge module is strong enough. In existing work, to identify lncRNA related mRNA sponge interactions, a principle followed is that the expression level of a lncRNA and the expression level of a mRNA need to be strongly and positively correlated. Following the same principle on strong positive correlation in expression levels while considering our modular competition hypothesis, LMSM requires the collective correlation between the expression levels of the group of lncRNAs and the group of target mRNAs in the same module to be strong and positive. To assess the collective correlation, we perform canonical correlation analysis [22] to obtain the canonical correlation between the group of lncRNAs and the group of mRNAs in a co-expression module. Let the two column vectors $X = {(x_{1}, x_{2}, …, x_{m})}^{T}$ and $Y = {(y_{1}, y_{2}, …, y_{n})}^{T}$ represent the group of lncRNAs and the group of mRNAs in a co-expression module respectively. $Σ_{X X}$ , $Σ_{Y Y}$ and $Σ_{X Y}$ are the variance or cross-covariance matrices calculated from the expression data of X and Y. The canonical correlation analysis seeks the canonical vectors a ( $a \in ℝ^{m}$ ) and b ( $b \in ℝ^{n}$ ) which maximize the correlation of $cor r (a^{T} X, b^{T} Y)$ . The canonical correlation between the group of lncRNAs and the group of mRNAs, denoted as CC_lncR-mR, is then calculated as follows with the found canonical vectors:

C C_{\ln c R - m R} =cor r (a^{T} X, b^{T} Y) = \frac{a^{T} \sum_{X Y} b}{\sqrt{a^{T} \sum_{X X} a} \sqrt{b^{T} \sum_{Y Y} b}}

In this work, we use the PMA R package [23] to compute canonical correlation.

Finally, the third criterion adapted from the sensitivity correlation [24] is employed to assess if the miRNAs shared by the group lncRNAs and the group of mRNAs in a module have large enough influence on the modular competition between the two groups of RNAs. To check according to this criterion, we incorporate miRNA expression data, and compute SCC_lncR-mR the sensitivity canonical correlation between the group of lncRNAs and the group of mRNAs in a co-expression module as follows:

S C C_{\ln c R - m R} = C C_{\ln c R - m R} -P C C_{\ln c R - m R}

where PCC_lncR-mR is the partial canonical correlation between the group of lncRNAs and the group of mRNAs, i.e. the canonical correlation conditioning on the expression of their shared miRNAs in the co-expression module, or the canonical correlation between the two groups of RNAs when the influence of the shared miRNAs is eliminated. Therefore, from Eq (3), we see that SCC_lncR-mR implies the correlation between the two groups of RNAs under the influence of their shared miRNAs.

PCC_lncR-mR in Eq (4) can be calculated as:

P C C_{\ln c R - m R} = \frac{C C_{\ln c R - m R} - C C_{mi R - m R} C C_{m i R - \ln c R}}{\sqrt{1 - C C_{mi R - m R}^{2}} \sqrt{1 - C C_{mi R - \ln c R}^{2}}}

where CC_miR-mR (CC_miR-lncR) is the canonical correlation between the set of miRNAs in the co-expression module and the group of mRNAs (lncRNAs) in the co-expression module.

In this study, empirically, a lncRNA-mRNA co-expressed module with p-value < 0.05 for the hypergeometric test of miRNA sharing (criterion 1), CC_lncR-mR > 0.8 for modular competition strength assessment (criterion 2) and SCC_lncR-mR > 0.1 for miRNA influence (criterion 3) is regarded as a lncRNA related miRNA sponge module (a LMSM module).

Evaluating statistical significance of LMSM modules

To evaluate the statistical significance of LMSM modules, we adapt the null model method proposed in [25]. The null model method hypothesizes that the shared miRNAs do not affect the correlation between two genes, i.e. the sensitivity correlation (the difference between correlation and partial correlation) between two genes is 0, and has been successfully applied to evaluate statistical significance of ceRNA interactions. Similar to [25], LMSM is also adapted from the Sensitivity Correlation (SC) method [24]. Therefore, the null model method can be applied to evaluate the statistical significance of LMSM modules. In our null model, the null hypothesis is that the group of the shared miRNAs does not influence the canonical correlation between the group of lncRNAs and the group of mRNAs, i.e. SCC_lncR-mR = 0. For each LMSM module, a group of lncRNAs or a group of mRNAs corresponds to a gene, and a group of the shared miRNAs corresponds to a miRNA in the null model. For obtaining more precise p-values, the number of datasets sampled is set to 1E+06 for the null model. Since the sampling procedure is computationally intensive, we use the pre-computed sets of covariance matrices in SPONGE R package [25] to build our null model. Based on the constructed null model, we can infer adjusted p-values (adjusted by Benjamini and Hochberg method [26]) for each LMSM module. A LMSM module with adjusted p-value less than 0.05 is regarded as a statistically significant module.

Application of LMSM in BRCA

BRCA enrichment analysis

Instead of performing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) enrichment analysis, to investigate whether an identified LMSM module is functionally associated with BRCA, we focus on conducting BRCA enrichment analysis by using a hypergeometric test. For a LMSM module, the p-value for the test is calculated as:

p - v a l u e = 1 - \sum_{i_{2} = 0}^{L_{2} - 1} \frac{(\begin{array}{l} M_{2} \\ i_{2} \end{array}) (\begin{array}{l} N_{2} - M_{2} \\ K_{2} - i_{2} \end{array})}{(\begin{array}{l} N_{2} \\ K_{2} \end{array})}

where N₂ is the number of genes (lncRNAs and mRNAs) in the dataset, M₂ denotes the number of BRCA genes in the dataset, K₂ represents the number of genes in the LMSM module, and L₂ is the number of BRCA genes in the LMSM module. A LMSM module with p-value < 0.05 is regarded as a BRCA-related module.

Module biomarker identification in BRCA

The module survival analysis can imply whether the identified LMSM modules are good biomarkers of the metastasis risks of cancer patients or not, and it can give us the hint whether the LMSM modules may be related to and potentially affect the metastasis or survival of cancer patients. For each BRCA sample, we fit the multivariate Cox model (proportional hazards regression model) [27] using the genes (lncRNAs and mRNAs) in LMSM modules to compute its risk score. All the BRCA samples are equally divided into the high risk and the low risk groups according to their risk scores. The Log-rank test is used to evaluate the difference of each LMSM module between the high and the low risk BRCA groups. Moreover, we also calculate the proportional hazard ratio (HR) between the high and the low risk BRCA groups. In this work, the survival R package [28] is utilized, and a LMSM module with Log-rank p-value < 0.05 and HR > 2 is regarded as a module biomarker in BRCA.

Identification of BRCA subtype-specific modules

As is known, BRCA is a heterogeneous disease with several molecular subtypes, and the choice of chemotherapy for each BRCA subtype is different. This diversity indicates that the genetic regulation of each BRCA subtype is specific. To identify BRCA subtype-specific modules, we firstly identify BRCA molecular subtypes using the PAM50 classifier [29]. By using a 50-gene subtype predictor, the PAM50 classifier classifies a BRCA sample into one of the five “intrinsic” subtypes: Luminal A (LumA), Luminal B (LumB), HER2-enriched (Her2), Basal-like (Basal) or Normal-like (Normal). In this work, we use the genefu R package [30] to predict molecular subtypes of each BRCA sample in the dataset used in our study.

To identify BRCA subtype-specific LMSM modules, we firstly need to estimate the enrichment scores of LMSM modules in BRCA samples. To calculate the enrichment score of each LMSM module in BRCA samples, the gene set variation analysis (GSVA) method [31] is used. To calculate the enrichment score, the GSVA method uses the Kolmogorov-Smirnov (KS) like random walk statistic as follows:

v_{j k} (l) = \frac{\sum_{i = 1}^{l} | r_{i j} |^{τ} I (g (i) \in γ_{k})}{\sum_{i = 1}^{p} | r_{i j} |^{τ} I (g (i) \in γ_{k})} - \frac{\sum_{i = 1}^{l} I (g (i) \notin γ_{k})}{p - | γ_{k} |}

where $τ$ ( $τ$ = 1 by default) is the weight of the tail in the random walk, r_ij is the normalized expression-level statistics of the i-th gene in the j-th sample as defined in [31], $γ_{k}$ is the k-th LMSM module, $I (g (i) \in γ_{k})$ is the indicator function on whether the i-th gene belongs to the LMSM module $γ_{k}$ , $| γ_{k} |$ is the number of genes in the k-th LMSM module, and p is the number of genes in the dataset.

To transform the KS like random walk statistic into an enrichment score (ES, also called GSVA score), we calculate the maximum deviation from zero of the random walk of the j-th sample with respect to the k-th LMSM module in the following:

E S_{j k}^{\max} = v_{j k} [\arg \max_{l = 1, …, p} (a b s (v_{j k} (l)))]

For each LMSM module $γ_{k}$ , the formula generates a distribution of enrichment scores that is bimodal (see the reference [31] for a more detailed description).

Based on the enrichment scores of LMSM modules in each BRCA sample, we further identify two types of BRCA subtype-specific LMSM modules, up-regulated modules and down-regulated modules. For one type of regulation pattern (up or down regulation), a LMSM module is regarded to be specific to a BRCA subtype. For an up-regulated BRCA subtype-specific LMSM module, the enrichment score of the LMSM module in the specific BRCA subtype samples is significantly larger than the score in the other BRCA subtype samples. For a down-regulated BRCA subtype-specific LMSM module, the enrichment score of the LMSM module in the specific BRCA subtype samples is significantly smaller than the score in the other BRCA subtype samples. For example, if a LMSM module $γ_{k}$ is up-regulated Basal-like specific, the enrichment scores of the LMSM module in Basal-like samples should be significantly larger than those in Luminal A, Luminal B, HER2-enriched and Normal-like samples. In this work, for each LMSM module, we use Welch's t-test [32] to calculate the significance p-value for the difference of the average enrichment scores between any two BRCA subtype samples. Given a BRCA subtype, a LMSM module is considered as an up-regulated (or down-regulated) module specific to this BRCA subtype if the module’s average enrichment score in samples of the given subtype is higher (or smaller) than the average enrichment score in samples of any other subtype and the significance p-value of the Welch’s t-test between the samples of this subtype and any other subtype is less than 0.05.

Performance of LMSM modules in classifying BRCA subtypes

In this section, to check the biological relevance of the discovered LMSM modules, we conduct module classification of BRCA subtypes. Here, classifying BRCA subtypes (LumA, LumB, Her2, Basal and Normal) is a multi-class classification (also known as a special case of multi-label classification). To understand the classification performance of the feature genes in each LMSM module, we apply a state-of-the-art multi-label learning strategy called Binary Relevance (BR) [33] implemented in the utiml R package [34] to conduct multi-label classification analysis. For the BR strategy, we use the Support Vector Machine (SVM) classifier [35] with default parameters implemented in e1071 R package [36] as the base algorithm to build the multi-label model. We select two commonly used multi-label classification measures: Subset accuracy and Hamming loss, and conduct 10-fold cross-validation to evaluate the performance of each LMSM module. In this work, Subset accuracy denotes the percentage of correct predictions and Hamming loss is the fraction of wrong predictions to the total number of predictions. Higher values of Subset accuracy and smaller values of Hamming loss indicate better classification performance. In addition, for the evaluation, we use the baseline method in [37], a commonly used multi-label classification method as the baseline for comparison. The base algorithm of the baseline method is also the SVM classifier with default parameters implemented in e1071 R package [36].

Results

Heterogeneous data sources

We collect matched miRNA, lncRNA and mRNA expression data, and clinical data of BRCA dataset from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/). A lncRNA or mRNA without a corresponding gene symbol in the expression data of BRCA dataset is removed. To obtain a unique expression value for replicates of miRNAs, lncRNAs or mRNAs, we compute the average expression value of the replicates. As a result, we obtain the matched expression data of 674 miRNAs, 12711 lncRNAs and 18344 mRNAs in 500 BRCA samples.

We retrieve putative miRNA-target interactions (including miRNA-lncRNA and miRNA-mRNA interactions) from several high-confidence miRNA-target interaction databases and use the combined database search results. Specifically, the putative miRNA-lncRNA interactions are obtained from NPInter v3.0 [38] and the experimental module of DIANA-LncBase v2.0 [39], and miRNA-mRNA interactions are from miRTarBase v8.0 [40], TarBase v7.0 [41] and miRWalk v2.0 [42].

The BRCA related mRNAs are from DisGeNET v5.0 [43] and COSMIC v86 [44], and the BRCA related lncRNAs are from LncRNADisease v2.0 [45], Lnc2Cancer v2.0 [46] and MNDR v2.0 [47]. The ground truth of lncRNA related miRNA sponge interactions is obtained by integrating the interactions from miRSponge [48], LncCeRBase [49] and LncACTdb v2.0 [50].

Most of the mediating miRNAs act as crosslinks across LMSM modules

Following the steps shown in Fig 2, we have identified 17 LMSM modules (details can be seen in S1 Data). The average size of the identified modules is 672.53 and the average number of the shared miRNAs in a module is 232.82. In total, there are 549 unique miRNAs mediating the 17 LMSM modules, and 90.16% (495 out of 549) miRNAs mediate at least two LMSM modules (details can be seen in S2 Data). This result indicates that most of the mediating miRNAs act as crosslinks across different LMSM modules.

LMSM modules are all statistically significant

In this section, by computing null-model-based p-values, we evaluate whether the identified LMSM modules are statistically significant or not. As a result, the adjusted p-values for the identified 17 LMSM modules (details can be seen in S3 Data) are all statistically significant (adjusted p-value = 1.00E-06). This result demonstrates that LMSM modules are all statistically significant.

Most of LMSM modules are implicated in BRCA

To investigate whether the identified LMSM modules are related to BRCA or not, we conduct BRCA enrichment analysis and identify BRCA module biomarkers using the methods described in Section Materials and methods. For the BRCA enrichment analysis, we have collected a list of 4819 BRCA genes (734 BRCA lncRNAs and 4085 BRCA mRNAs) associated with the matched lncRNA and mRNA expression data (details in S4 Data). As shown in Table 1, 10 out of 17 LMSM modules are functionally enriched in BRCA at a significant level (p-value < 0.05). In Table 2, 15 out of 17 LMSM modules are regarded as module biomarkers in BRCA at a significant level (Log-rank p-value < 0.05 and HR > 2). Particularly, 90% (9 out of 10, excepting LMSM 14) of the BRCA-related LMSM modules can act as module biomarker in BRCA. These results show that most of LMSM modules are functionally implicated in BRCA.

Table 1. BRCA-related LMSM modules.

L₂ is the number of BRCA genes in each LMSM module, K₂ represents the number of genes in each LMSM module, the number of BRCA genes in the dataset (M₂) is 4819, and the number of genes in the dataset (N₂) is 31055.

Module ID	L₂	K₂	p-value
LMSM 2	327	1338	0
LMSM 3	259	1340	7.34E-05
LMSM 4	78	392	1.14E-02
LMSM 5	89	449	8.07E-03
LMSM 6	88	370	1.97E-05
LMSM 8	275	880	0
LMSM 12	24	110	4.95E-02
LMSM 13	20	76	1.05E-02
LMSM 14	252	1004	8.88E-16
LMSM 16	48	182	1.11E-04

Open in a new tab

Table 2. Survival analysis of LMSM modules in BRCA.

HRlow95 and HRup95 represent the lower and upper of 95% confidence interval of HR, respectively.

Module ID	Chi-square	p-value	HR	HRlow95	HRup95
LMSM 1	170.37	0	10.75	5.88	19.65
LMSM 2	107.34	0	6.03	3.12	11.66
LMSM 3	90.62	0	5.43	2.94	10.01
LMSM 4	138.81	0	14.94	8.83	25.27
LMSM 5	148.49	0	8.64	4.63	16.13
LMSM 6	142.64	0	13.40	7.83	22.92
LMSM 7	161.91	0	13.97	8.01	24.36
LMSM 8	103.63	0	5.91	3.07	11.37
LMSM 10	144.86	0	8.63	4.74	15.71
LMSM 11	120.79	0	9.49	5.55	16.23
LMSM 12	49.31	2.19E-12	5.46	3.38	8.80
LMSM 13	60.08	9.10E-15	5.72	3.48	9.41
LMSM 15	83.26	0	12.00	7.46	19.32
LMSM 16	110.94	0	11.25	6.79	18.66
LMSM 17	106.96	0	9.14	5.42	15.41

Open in a new tab

LMSM modules are mostly BRCA subtype-specific

In this section, we firstly divide the 500 BRCA samples into five “intrinsic” subtypes (Luminal A, Luminal B, HER2-enriched, Basal-like and Normal-like). The numbers of LumA, LumB, Her2, Basal and Normal samples are 190, 155, 52, 85 and 18, respectively. Then we calculate the enrichment scores of the identified 17 LMSM modules in the BRCA subtype samples respectively (details in S5 Data).

As illustrated in Fig 3, out of the 17 LMSM modules, 4 and 6 modules are identified as up-regulated and down-regulated BRCA subtype-specific LMSM modules, respectively. For the up-regulated BRCA subtype-specific LMSM modules, the numbers of Basal-specific, LumB-specific and Normal-specific modules are 1, 1 and 2, respectively. The numbers of Basal-specific, LumB-specific and Normal-specific modules are 3, 1 and 2 respectively among the down-regulated BRCA subtype-specific LMSM modules. In particular, only 1 module (LMSM 2) can act as both up-regulated and down-regulated BRCA subtype-specific LMSM module. In total, the unique number of BRCA subtype-specific LMSM modules is 9, indicating that most of LMSM modules are BRCA subtype-specific.

The performance of LMSM modules is significantly higher than baseline’s performance in classifying BRCA subtypes

For the identified 17 LMSM modules, the average Subset accuracy and Hamming loss in classifying BRCA subtypes is 0.7547 and 0.0892, respectively (details can be seen in S6 Data), The Subset accuracy and Hamming loss of the baseline are 0.3800 and 0.2480, respectively. By using Welch’s t-test method, the Subset accuracy achieved using the 17 LMSM modules is significantly larger (better) than the Subset accuracy of the baseline (p-value < 2.20E-16), and the Hamming loss of the 17 LMSM modules is significantly smaller (better) than the Hamming loss of the baseline (p-value < 2.20E-16). The better performance than the baseline method indicates that LMSM modules are biological meaningful in classifying BRCA subtypes.

Several lncRNA-related miRNA sponge interactions are experimentally confirmed

For the ground truth used in the validation, we have collected 581 experimentally validated lncRNA-related miRNA sponge interactions associated with the matched lncRNA and mRNA expression data (details in S4 Data). After we merge the sponge lncRNA-mRNA pairs in the identified 17 LMSM modules, we have predicted 1471664 unique lncRNA-related miRNA sponge interactions (details at https://github.com/zhangjunpeng411/LMSM). For each LMSM module, the number of shared miRNAs, lncRNAs, mRNAs, predicted lncRNA-related miRNA sponge interactions can be seen in S7 Data.

As shown in Table 3, there are 4 LMSM modules (LMSM 2, LMSM 3, LMSM 5 and LMSM 8) containing 14 experimentally validated lncRNA-related miRNA sponge interactions in total. It is noted that all the lncRNAs and mRNAs in these confirmed lncRNA-related miRNA sponge interactions are BRCA-related genes, indicating they may have potentially involved in BRCA.

Table 3. Validated lncRNA-related miRNA sponge interactions.

Module ID	Validated lncRNA-related miRNA sponge interactions
LMSM 2	H19: HMGA2, H19:IGF2, H19:ITGB1, H19: TGFB1, H19: VIM, H19:RUNX1, H19:CDH13, H19:KLF4, H19:TGFBI, H19:VDR
LMSM 3	LINC00152: MCL1
LMSM 5	NEAT1: EMP2
LMSM 8	LINC00324: BTG2, DLEU2: CCNE1

Open in a new tab

LMSM is capable of predicting miRNA targets

LMSM use high-confidence miRNA-target interactions as seeds to predict miRNA-target interactions. A miRNA-mRNA or miRNA-lncRNA pair in a LMSM module has the potential to be a miRNA-target pair for the following reasons. Firstly, at sequence level, the sponge lncRNAs and mRNAs in each LMSM module have a significant sharing of miRNAs. Secondly, at expression level, the sponge lncRNAs and mRNAs in each LMSM module are highly correlated. As a result, the sponge lncRNAs and mRNAs of each LMSM module have a high chance to be target genes of the shared miRNAs. Thus, based on the identified LMSM modules, we have predicted 2820524 unique miRNA-target interactions (including 2023304 miRNA-lncRNA and 797220 miRNA-mRNA interactions) (details at https://github.com/zhangjunpeng411/LMSM). For each LMSM module, the numbers of predicted miRNA-lncRNA interactions and miRNA-mRNA interactions can be seen in S7 Data.

In addition, we investigate the intersection of the miRNA-target interactions predicted by LMSM with the other well-cited miRNA-target prediction methods. In terms of miRNA-mRNA interactions, we select TargetScan v7.2 [51], DIANA-microT-CDS v5.0 [52], starBase v3.0 [53] and miRWalk v3.0 [54] for investigation. We choose starBase v3.0 [53] and DIANA-LncBase v2.0 [39] for investigation in terms of miRNA-lncRNA interactions. As shown in the UpSet plot [55] of Fig 4A, the number of miRNA-mRNA interactions identified by all the five methods is only 21842. However, the percentage of overlap between LMSM and each of the other four methods achieves ~63.74% (1289620 out of 2023304). As shown in Fig 4B, the number of miRNA-lncRNA interactions identified by all the three methods is only 1160. Since the miRNA-lncRNA interactions are still limited, most of the miRNA-lncRNA interactions (~93.90%, 748609 out of 797220) are individually predicted by LMSM.

Fig 4 — (A) Predicted miRNA-mRNA interactions between LMSM and TargetScan, DIANA_microT_CDS, starBase, miRWalk. (B) Predicted miRNA-lncRNA interactions between LMSM and starBase, DIANA_LncBase. Each column corresponds to an exclusive intersection that includes the elements of the sets denoted by the dark or red circles, but not of the others. The overlap size between different methods denotes exclusive overlaps, i.e. the overlap set not in a subset of any other overlap set.

Comparison with graph clustering-based strategy

Graph clustering-based strategy [12–17] is an alternative approach to identifying lncRNA related miRNA sponge modules. As there is no graph clustering-based strategy specifically designed for finding lncRNA related miRNA sponge modules, so we create a baseline Graph Clustering-based method (called GC in this paper) which uses well-known network construction and graph clustering methods as described in the following. The GC method includes two steps: i) identifying lncRNA related miRNA sponge interaction network, and ii) identifying lncRNA related miRNA sponge modules from the identified network. In step 1, we adapt the well-cited Sensitivity Correlation (SC) method [24] implemented in the miRspongeR R package [56] to infer lncRNA related miRNA sponge interaction network. A lncRNA-mRNA pair is considered as an interacting pair in the network if they have significant sharing of the miRNAs, significant correlation and adequate sensitivity correlation. We require that the pairs must share at least 3 miRNAs and their sensitivity correlation (the difference between correlation and partial correlation) must be larger than 0.1. The statistically significance of the miRNA sharing and positive correlations are tested using hypergeometric test and Welch's t-test respectively, with a significant level at 0.05. In step 2, we use the well-cited Markov cluster (MCL) algorithm [57] to infer lncRNA related miRNA sponge modules. Here, each obtained cluster corresponds to a module. Each module should contain at least 2 sponge lncRNAs and 2 target mRNAs. In total, by using the GC method, we have obtained 108 lncRNA related miRNA sponge modules.

We compare LMSM and GC in terms of the percentage of BRCA-related modules, the percentage of module biomarkers in BRCA, the classification performance (mean Subset accuracy and mean Hamming loss) in classifying BRCA subtypes, and the number of validated lncRNA-related miRNA sponge interactions. As shown in Table 4, the comparison result indicates that LMSM always performs better than the GC method. The detailed results of the GC method can be seen in S8 Data.

Table 4. Comparison results between LMSM and GC.

Method	%BRCA-related modules	%Module biomarkers	Mean Subset accuracy	Mean Hamming loss	#Validated interactions
LMSM	58.82%	88.24%	0.7547	0.0892	14
GC	32.41%	66.67%	0.6586	0.1319	2

Open in a new tab

LMSM is robust

To demonstrate the robustness of the LMSM workflow, we use the sparse group factor analysis (SGFA) method [58], instead of the WGCNA method to identify lncRNA-mRNA co-expression modules. The SGFA method is extended from the group factor analysis (GFA) method [59–61], and it can reliably infer biclusters (modules) from multiple data sources, and provide predictive and interpretable structure existing in any subset of the data sources. Given B biclusters to be identified, the SGFA method assigns each column (lncRNA or mRNA) or row (sample) a grade of membership (association) belonging to these biclusters. The range of the values of the associations is [–1, 1]. We use the absolute value of association (AVA) to evaluate the strength of lncRNAs and mRNAs belonging to a bicluster, and the cutoff of AVA is also set to 0.8. Specifically, we use the GFA R package [58] to identify lncRNA-mRNA co-expression modules. The parameter settings for inferring lncRNA-related miRNA sponge modules are the same.

By using the SGFA method, we have identified 51 LMSM modules (details can be seen in S1 Data). The average size of these LMSM modules is 277.63 and the average number of the shared miRNAs is 135.65. There are 490 unique miRNAs mediating the 51 LMSM modules, and 84.90% (416 out of 490) miRNAs mediate at least two LMSM modules (details can be seen in S2 Data). As the result obtained using the WGCNA method, the result with the SGFA method also implies that the mediating miRNAs mostly act as crosslinks across different LMSM modules. In addition, by using a null-model-based p-value computation method, the identified 51 LMSM modules are also all statistically significant with adjusted p-value $\leq$ 5.00E-06 (details can be seen in S3 Data).

As shown in Table A of S1 File, 3 out of the 51 LMSM modules are functionally enriched in BRCA at a significant level (p-value < 0.05). Moreover, 49 out of the 51 LMSM modules are regarded as module biomarkers in BRCA (see in Table B of S1 File). The results indicate that most of LMSM modules are related to BRCA.

We also compute the enrichment scores of the identified 51 LMSM modules in the BRCA subtype samples (details in S5 Data). As illustrated in Fig A of S1 File, out of the 51 LMSM modules, 33 and 24 modules are regarded as up-regulated and down-regulated BRCA subtype-specific LMSM modules, respectively. For the up-regulated BRCA subtype-specific LMSM modules, the numbers of Basal-specific, Her2-specific, LumB-specific and Normal-specific modules are 27, 2, 2 and 2, respectively. The numbers of Basal-specific, Her2-specific, LumA-specific, LumB-specific and Normal-specific modules are 2, 3, 15, 3 and 1 respectively for the down-regulated BRCA subtype-specific LMSM modules. Particularly, 16 modules can act as both up-regulated and down-regulated BRCA subtype-specific LMSM module. Overall, the unique number of BRCA subtype-specific LMSM modules is 41. This result also indicates that the identified LMSM modules are mostly BRCA subtype-specific.

The average value of Subset accuracy and Hamming loss of the identified 51 LMSM modules in classifying BRCA subtypes is 0.6921 and 0.1135, respectively (details can be seen in S6 Data). In classifying BRCA subtypes, the baseline value of Subset accuracy and Hamming loss is 0.3800 and 0.2480, respectively. By using Welch’s t-test method, the value of Subset accuracy for 51 LMSM modules is significantly larger (better) than the baseline value of Subset accuracy (p-value < 2.20E-16), and the value of Hamming loss for 51 LMSM modules is significantly smaller (better) than the baseline value of Hamming loss (p-value < 2.20E-16). The better performance than the baseline method also indicates that LMSM modules are biological meaningful in classifying BRCA subtypes.

Moreover, we have predicted 605456 unique lncRNA-related miRNA sponge interactions in the identified 51 LMSM modules (details at https://github.com/zhangjunpeng411/LMSM). The number of the shared miRNAs, lncRNAs, mRNAs, predicted lncRNA-related miRNA sponge interactions of each LMSM module can be seen in S7 Data. Since the experimentally validated lncRNA-related miRNA sponge interactions are still limited, only 4 LMSM modules containing 4 lncRNA-related miRNA sponge interactions (see Table C of S1 File) are experimentally validated. All lncRNAs and mRNAs in the confirmed lncRNA-related miRNA sponge interactions are also BRCA-related genes.

LMSM also has identified a large number of potential miRNA-target interactions (1646449 in total, including 435345 miRNA-mRNA and 1211104 miRNA-lncRNA interactions, details at https://github.com/zhangjunpeng411/LMSM). The number of predicted miRNA-lncRNA interactions, predicted miRNA-mRNA interactions, putative miRNA-lncRNA interactions and putative miRNA-mRNA interactions can be seen in S7 Data. As illustrated in Fig B of S1 File, the number of the miRNA-mRNA interactions identified by all the five methods is 4897 and the number of the miRNA-lncRNA interactions identified by all the three methods is 1149. Most of the identified miRNA-mRNA interactions by LMSM (~58.55%, 254910 out of 435345) are also predicted by one of the other four methods. In terms of the predicted miRNA-lncRNA interactions, ~94.23% (1141232 out of 1211104) miRNA-lncRNA interactions are also individually predicted by LMSM.

Finally, in terms of the percentage of BRCA-related modules, the percentage of module biomarkers in BRCA, the classification performance (mean Subset accuracy and mean Hamming loss) in classifying BRCA subtypes, and the number of validated lncRNA-related miRNA sponge interactions, LMSM also generally performs better than the GC method (see Table D of S1 File).

Altogether, the above results are consistent with those obtained using the WGCNA method, indicating that our LMSM workflow is robust for studying lncRNA-related miRNA sponge modules.

Discussion

The crosstalk between different RNA transcripts in a miRNA-dependent manner forms a complex miRNA sponge interaction network and depicts a novel layer of gene expression regulation. Until now, several types of RNA transcripts, e.g. lncRNAs, pseudogenes, circRNAs and mRNAs, have been confirmed to act as miRNA sponges. Since lncRNAs are a large class of ncRNAs and function in many aspects of cell biology, including human cancers, we focus on identifying lncRNA related miRNA sponge modules in this work.

By integrating multiple data sources, previous studies mainly investigate the identification of lncRNA related miRNA sponge interaction network. Based on the identified lncRNA related miRNA sponge interaction network, they use graph clustering algorithms to further infer lncRNA related miRNA sponge modules. Different from existing computational methods on lncRNA related miRNA sponge modules, in this work, we propose a novel method named LMSM to directly identify lncRNA related miRNA sponge modules from heterogeneous data. It is noted that the LMSM method depends on our presented hypothesis of miRNA sponge modular competition. In the hypothesis, miRNA sponges tend to form a group to compete with a group of target mRNAs for binding with miRNAs.

We have applied the LMSM method to the BRCA dataset from TCGA. For the putative miRNA-target interactions, we integrate high-confidence miRNA-target interactions from several databases. The analysis results demonstrate that our LMSM method is useful in identifying lncRNA related miRNA sponge modules, and it can help with understanding regulatory mechanism of lncRNAs.

LMSM is a flexible method to investigate miRNA sponge modules in human cancer. Firstly, any biclustering or clustering algorithm (e.g. the joint non-negative matrix factorization methods presented by Deng et al. [18] and Xiao et al. [19]) can be plugged in stage 1 of LMSM to identify lncRNA-mRNA co-expression modules. The only condition for using these algorithms is that they can be used to identify biclusters or clusters from high-dimensional expression data. Secondly, LMSM is a parametric model, and the parameter settings of LMSM can be replaced according to the practical requirements of researchers. For example, the threshold of the three metrics in stage 2 for identifying lncRNA related miRNA sponge modules can be looser or stricter. Thirdly, LMSM can also be extended to study other ncRNA (e.g. circRNA and pseudogene) related miRNA sponge modules. For instance, if we change the matched lncRNA expression data and the miRNA-lncRNA interactions to matched circRNA expression data and the miRNA-circRNA interactions respectively, the pipeline of LMSM is to identify circRNA related miRNA sponge modules.

It is noted that each LMSM module contains many sponge lncRNAs and mRNAs, so it is hard to experimentally validate such a module by follow-up wet-lab experiments. This is a common issue of existing computational methods, including LMSM. We suggest that biologists can select some sponge lncRNAs and mRNAs of interest in each LMSM module, and then validate the modular competition between the selected sponge lncRNAs and target mRNAs. We believe that LMSM is still useful in shortlisting high-confidence sponge lncRNAs and mRNAs for experimental validation. For example, previous study [62] has shown that lncRNA MIR22HG is functionally complementary to lncRNA H19. In the identified LMSM module no. 2 (LMSM 2), lncRNA H19 is experimentally validated to compete with 10 target mRNAs (HMGA2, IGF2, ITGB1, TGFB1, VIM, RUNX1, CDH13, KLF4, TGFBI and VDR). Thus, biologists can select 2 lncRNAs (H19 and MIR22HG) and 10 target mRNAs (HMGA2, IGF2, ITGB1, TGFB1, VIM, RUNX1, CDH13, KLF4, TGFBI and VDR) in LMSM 2 to validate the modular competition between them.

Taken together, based on the hypothesis of miRNA sponge modular competition, we propose a new approach to identifying lncRNA related miRNA sponge modules by integrating expression data and miRNA-target binding information. Our method not only extends the ceRNA hypothesis, but also provides a novel way to investigate the biological functions and modular mechanism of lncRNAs in BRCA. We believe that our method can be also applied to other human cancer datasets assists in human cancer research.

Supporting information

S1 Data. The identified LMSM modules.

(XLSX)

Click here for additional data file.^{(171.9KB, xlsx)}

S2 Data. The distribution of the shared miRNAs in LMSM modules.

(XLSX)

Click here for additional data file.^{(43.7KB, xlsx)}

S3 Data. Statistically significant analysis results of LMSM modules.

(XLSX)

Click here for additional data file.^{(16.5KB, xlsx)}

S4 Data. BRCA-related genes and experimentally validated lncRNA related miRNA sponge interactions.

(XLSX)

Click here for additional data file.^{(78.7KB, xlsx)}

S5 Data. The enrichment scores of the identified LMSM modules in the BRCA subtype samples.

(XLSX)

Click here for additional data file.^{(484.9KB, xlsx)}

S6 Data. Classification analysis results of LMSM modules in classifying BRCA subtypes.

(XLSX)

Click here for additional data file.^{(13.2KB, xlsx)}

S7 Data. The number of shared miRNAs, lncRNAs, mRNAs, predicted interactions for each LMSM module.

(XLSX)

Click here for additional data file.^{(15.7KB, xlsx)}

S8 Data. The results of a graph clustering-based strategy.

(XLSX)

Click here for additional data file.^{(68.3KB, xlsx)}

S1 File. Supporting file.

Supplementary file.

(DOCX)

Click here for additional data file.^{(475.2KB, docx)}

Acknowledgments

We thank Prof. Guanwen Fang for the support from the Yunnan young and middle-aged academic and technical leaders reserve talent program and the Yunnan ten thousand talent program-young top-notch talent.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

JZ was supported by the National Natural Science Foundation of China (Grant Number: 61702069, 61963001), the Applied Basic Research Foundation of Science and Technology of Yunnan Province (Grant Number: 2017FB099). LL and JL were supported by the Australian Research Council Discovery Grant (Grant Number: DP170101306). TX was supported by the National Natural Science Foundation of China (Grant Number: 61902372). WZ was supported by the Education Science Research Foundation of Yunnan Province (Grant Number: 2018JS416). NR was supported by the National Natural Science Foundation of China (Grant Number: 61872405, 61720106004). TDL was supported by NHMRC Grant (Grant Number: 1123042). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2012; 22(9):1775–1789. 10.1101/gr.132159.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Fang Y, Fullwood MJ. Roles, functions, and mechanisms of long non-coding RNAs in cancer. Genomics Proteomics Bioinformatics 2016; 14(1):42–54. 10.1016/j.gpb.2015.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bhan A, Soleimani M, Mandal SS. Long noncoding RNA and cancer: a new paradigm. Cancer Res 2017; 77(15):3965–81. 10.1158/0008-5472.CAN-16-2634 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kopp F, Mendell JT. (2018) Functional classification and experimental dissection of long noncoding RNAs. Cell 2018; 172(3):393–407. 10.1016/j.cell.2018.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ambros V. The functions of animal microRNAs. Nature 2004; 431(7006):350–5. 10.1038/nature02871 [DOI] [PubMed] [Google Scholar]
6.Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004; 116(2):281–97. 10.1016/s0092-8674(04)00045-5 [DOI] [PubMed] [Google Scholar]
7.Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 2011; 146(3):353–8. 10.1016/j.cell.2011.07.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Tay Y, Rinn J, Pandolfi PP. The multilayered complexity of ceRNA crosstalk and competition. Nature 2014; 505(7483):344–52. 10.1038/nature12986 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zhou S, He Y, Yang S, Hu J, Zhang Q, Chen W, et al. The regulatory roles of lncRNAs in the process of breast cancer invasion and metastasis. Biosci Rep 2018; 38(5):BSR20180772 10.1042/BSR20180772 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Peng F, Li TT, Wang KL, Xiao GQ, Wang JH, Zhao HD, et al. H19/let-7/LIN28 reciprocal negative regulatory circuit promotes breast cancer stem cell maintenance. Cell Death Dis 2017; 8(1):e2569 10.1038/cddis.2016.438 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Le TD, Zhang J, Liu L, Li J. Computational methods for identifying miRNA sponge interactions. Brief Bioinform 2017; 18(4):577–590. 10.1093/bib/bbw042 [DOI] [PubMed] [Google Scholar]
12.Shao T, Wu A, Chen J, Chen H, Lu J, Bai J, et al. Identification of module biomarkers from the dysregulated ceRNA-ceRNA interaction network in lung adenocarcinoma. Mol Biosyst 2015; 11(11):3048–58. 10.1039/c5mb00364d [DOI] [PubMed] [Google Scholar]
13.Zhang Y, Xu Y, Feng L, Li F, Sun Z, Wu T, et al. Comprehensive characterization of lncRNA-mRNA related ceRNA network across 12 major cancers. Oncotarget 2016; 7(39):64148–67. 10.18632/oncotarget.11637 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zhang J, Le TD, Liu L, Li J. Inferring miRNA sponge co-regulation of protein-protein interactions in human breast cancer. BMC Bioinformatics 2017; 18(1):243 10.1186/s12859-017-1672-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Wang H, Xu D, Huang H, Cui Y, Li C, Zhang C, et al. Detection of dysregulated competing endogenous RNA modules associated with clear cell kidney carcinoma. Mol Med Rep 2018; 18(2):1963–72. 10.3892/mmr.2018.9189 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Do D, Bozdag S. Cancerin: A computational pipeline to infer cancer-associated ceRNA interaction networks. PLoS Comput Biol 2018; 14(7):e1006318 10.1371/journal.pcbi.1006318 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Zhang J, Liu L, Li J, Le TD. LncmiRSRN: identification and analysis of long non-coding RNA related miRNA sponge regulatory network in human cancer. Bioinformatics 2018; 34(24):4232–40. 10.1093/bioinformatics/bty525 [DOI] [PubMed] [Google Scholar]
18.Deng J, Kong W, Wang S, Mou X, Zeng W. Prior knowledge driven joint NMF algorithm for ceRNA co-module identification. Int J Biol Sci 2018; 14(13):1822–1833. 10.7150/ijbs.27555 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Xiao Q, Luo J, Liang C, Cai J, Li G, Cao B. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics 2019; 20(1):67 10.1186/s12859-019-2654-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008; 9:559 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003; 34(2):166–76. 10.1038/ng1165 [DOI] [PubMed] [Google Scholar]
22.Hotelling H. Relations between two sets of variates. Biometrika 1936; 28(3/4):321–377. 10.2307/2333955 [DOI] [Google Scholar]
23.Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 2009; 10(3):515–34. 10.1093/biostatistics/kxp008 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Paci P, Colombo T, Farina L. Computational analysis identifies a sponge interaction network between long non-coding RNAs and messenger RNAs in human breast cancer. BMC Syst Biol 2014; 8:83 10.1186/1752-0509-8-83 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.List M, Dehghani Amirabad A, Kostka D, Schulz MH. Large-scale inference of competing endogenous RNA networks with sparse partial correlation. Bioinformatics 2019; 35(14):i596–i604. 10.1093/bioinformatics/btz314 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995; 57(1): 289–300. 10.2307/2346101 [DOI] [Google Scholar]
27.Andersen P, Gill R. Cox's regression model for counting processes, a large sample study. Ann Stat 1982; 10(4):1100–20. 10.1214/aos/1176345976 [DOI] [Google Scholar]
28.Therneau TM, Grambsch PM. Modeling survival data: extending the Cox model. Springer-Verlag, New York, 2000. [Google Scholar]
29.Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 2009; 27(8):1160–7. 10.1200/JCO.2008.18.1370 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Gendoo DM, Ratanasirigulchai N, Schröder MS, Paré L, Parker JS, Prat A, et al. Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 2016; 32(7):1097–9. 10.1093/bioinformatics/btv693 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Hänzelmann S. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013; 14:7 10.1186/1471-2105-14-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Welch BL. The generalisation of student's problems when several different population variances are involved. Biometrika 1947; 34(1–2):28–35. 10.1093/biomet/34.1-2.28 [DOI] [PubMed] [Google Scholar]
33.Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-Label Data In Maimon O. and Rokach L., editors, Data Mining and Knowledge Discovery Handbook, chapter 34, pages 667–685. Springer-Verlag, 2 edition, 2010. ISBN 0387244352. 10.1007/978-0-387-09823-4_34 [DOI] [Google Scholar]
34.Rivolli A, de Carvalho AC. The utiml package: Multi-label classification in R. The R Journal 2018; 10(2): 24–37. 10.32614/RJ-2018-041 [DOI] [Google Scholar]
35.Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM T Intel Syst Tec. 2011; 2(3):1–27. 10.1145/1961189.1961199 [DOI] [Google Scholar]
36.Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien, R package version, 1.7–3. 2019. https://CRAN.R-project.org/package=e1071.
37.Metz J, de Abreu LF, Cherman EA, Monard MC. On the estimation of predictive evaluation measure baselines for multi-label learning. In 13th Ibero-American Conference on AI, pages 189–198, Cartagena de Indias, Colombia, 2012. 10.1007/978-3-642-34654-5_20. [DOI]
38.Hao Y, Wu W, Li H, Yuan J, Luo J, Zhao Y, et al. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford) 2016. 10.1093/database/baw057 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Paraskevopoulou MD, Vlachos IS, Karagkouni D, Georgakilas G, Kanellos I, Vergoulis T, et al. DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res 2016; 44(Database issue):D231–D238. 10.1093/nar/gkv1270 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Huang HY, Lin YC, Li J, Huang KY, Shrestha S, Hong HC, et al. miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res 2019. 10.1093/nar/gkz896 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic Acids Res 2015; 43(Database issue):D153–D159. 10.1093/nar/gku1215 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Dweep H, Gretz N. miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nat Methods 2015; 12(8):697 10.1038/nmeth.3485 [DOI] [PubMed] [Google Scholar]
43.Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 2017; 45(Database issue):D833–D839. 10.1093/nar/gkw943 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res 2017; 45(Database issue):D777–D783. 10.1093/nar/gkw1121 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res 2019; 47(Database issue):D1034–D1037. 10.1093/nar/gky905 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Gao Y, Wang P, Wang Y, Ma X, Zhi H, Zhou D, et al. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res 2019; 47(Database issue):D1028–D1033. 10.1093/nar/gky1096 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Cui T, Zhang L, Huang Y, Yi Y, Tan P, Zhao Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res 2018; 46(Database issue):D371–D374. 10.1093/nar/gkx1025 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Wang P, Zhi H, Zhang Y, Liu Y, Zhang J, Gao Y, et al. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs. Database (Oxford) 2015. 10.1093/database/bav098 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Pian C, Zhang G, Tu T, Ma X, Li F. LncCeRBase: a database of experimentally validated human competing endogenous long non-coding RNAs. Database (Oxford) 2018. 10.1093/database/bay061 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Wang P, Li X, Gao Y, Guo Q, Wang Y, Fang Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res 2019; 47(Database issue):D121–D127. 10.1093/nar/gky1144 [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife 2015; 4 10.7554/eLife.05005 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, et al. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Res 2013; 41(Web Server issue):W169–73. 10.1093/nar/gkt393 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2014; 42(Database issue):D92–7. 10.1093/nar/gkt1248 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Sticht C, De La Torre C, Parveen A, Gretz N. miRWalk: An online resource for prediction of microRNA binding sites. PLoS One 2018; 13(10):e0206239 10.1371/journal.pone.0206239 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 2017; 33(18):2938–2940. 10.1093/bioinformatics/btx364 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Zhang J, Liu L, Xu T, Xie Y, Zhao C, Li J, et al. miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules. BMC Bioinformatics 2019; 20(1):235 10.1186/s12859-019-2861-y [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002; 30(7):1575–1584. 10.1093/nar/30.7.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Bunte K, Leppäaho E, Saarinen I, Kaski S. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics 2016; 32(16):2457–63. 10.1093/bioinformatics/btw207 [DOI] [PubMed] [Google Scholar]
59.Klami A, Virtanen S, Leppäaho E, Kaski S. Group factor analysis. IEEE Trans Neural Netw Learn Syst 2015; 26(9):2136–2147. 10.1109/TNNLS.2014.2376974 [DOI] [PubMed] [Google Scholar]
60.Suvitaival T, Parkkinen JA, Virtanen S, Kaski S. Cross-organism toxicogenomics with group factor analysis. Syst Biomed 2014; 2(4):71–80. 10.4161/sysb.29291 [DOI] [Google Scholar]
61.Virtanen S, Klami A, Khan S, Kaski S. Bayesian group factor analysis. In: Lawrence,N. and Girolami,M. (eds), Proc. of the 15th International Conference on Artificial Intelligence and Statistics, 2012; pp. 1269–1277.
62.Wang P, Ning S, Zhang Y, Li R, Ye J, Zhao Z, et al. Identification of lncRNA-associated competing triplets reveals global patterns and prognostic markers for cancer. Nucleic Acids Res 2015; 43(7):3478–89. 10.1093/nar/gkv233 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007851.r001

Decision Letter 0

Teresa M Przytycka, William Stafford Noble

20 Jan 2020

Dear Prof. Zhang,

Thank you very much for submitting your manuscript "LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Teresa M. Przytycka

Associate Editor

PLOS Computational Biology

William Noble

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The paper describes a framework, LMSM, to identify LncRNA related MiRNA Sponge Modules from heterogeneous data. To understand the miRNA sponging activities in biological conditions, LMSM uses gene expression data to evaluate the influence of the shared miRNAs on the clustered sponge lncRNAs and mRNAs. This may be an important paper in the field however

1) I do not see any control experiments. There is some overlap with previous results however a statistical significance of this overlap is not clear at all. There are many lncRNAs, some overlap is expected due to random reasons.

2) There is no attempt to check biological relevance of the obtained module classification. I can not understand a quality of the classification. May be some experiments will improve the paper.

Reviewer #2: In their manuscript "LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer", Zhang et al. present a novel method for investigating the competing endogeneous RNA effect of long non-coding RNAs (lncRNA) on the level of modules. Here LMSM differs from existing methods that focus on pairwise consideratoins of lncRNA and mRNAs. The authors refer to this as miRNA sponge modular competition hypothesis. To obtain modules, LMSM uses the well established method WCGNA. Modules containing lncRNA as well as mRNAs (considered as two groups) are then evaluated using three criteria (i) significant number of shared miRNAs assessed using the hypergeometric test, (ii) canonical correlation analysis to assess the influence of the lncRNA group over the mRNA group, (iii) miRNA expression data is included to compute the partial-correlation based sensitivity correlation which was first introduced by Paci et. al. Sensitivity correlation is modified here for the use of partial canonical correlation to consider the effect of the lncRNA group to the mRNA group.

# Major:

- While it is appreciated that the authors released their source code on github, the software should also be documented, i.e. a README with installation instructions, dependencies and usage examples is needed.

- The authors differentiate between competing RNAs such as lncRNA or circRNAs and the mRNAs as a target group. I do not agree with this view as it neglects the fact that mRNAs act as competitors as well. For example, a highly expressed protein-coding mRNA may sponge miRNAs to increase the availability of other mRNAs coding for important interacting proteins, thus allowing the formation of protein complexes. This limitation should be discussed more critically in the manuscript.

- In the Methods, the authors state that "the available resources of lncRNAs are more abundant than those of other coding RNAs, circRNAs and pseudogenes". I do not understand this statement as clearly coding mRNAs are more abundant than lncRNAs.

- In WCGNA, R2 is empirically set as 0.8. Can you elaborate what you mean by empirically?

- In the third module assessment step, sensitivity correlation was used to quantify the sponging effect based on matched gene and miRNA expression data. We could recently show a major issue with sensitivity correlation is that it is biased by several factors including gene-gene correlation, sample number and number of shared miRNAs (List et al., https://doi.org/10.1093/bioinformatics/btz314). This bias can be adjusted for by using a suitable null model for infering significance. Could this null model approach described in the above paper be adapted towards the use partial canonical correlation?

- What is the source for the interactions of H19 and NEAT1? H19 and IGF2 for example share a complex relationship in imprinting that is not likely explainable by miRNA regulation. Thus, the results of module 2 should be seen critically.

- The main innovation of this method is that is infers miRNA sponge modules directly rather than first generating a comprehensive network which is then split into modules via graph clustering approaches. While this is indeed a promising strategy, the manuscript lacks a quantitative comparison to show advantages over the graph clustering strategy.

# Minor

- Table 1, column N2 is superfluous, N2 can be mentioned once in the caption.

- The interaction H19:TGFB1 is listed twice in Table 3.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Markus List

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. 2020 Apr 23;16(4):e1007851. doi: 10.1371/journal.pcbi.1007851.r002

Author response to Decision Letter 0

25 Feb 2020

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(38.5KB, docx)}

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007851.r003

Decision Letter 1

Teresa M Przytycka, William Stafford Noble

6 Apr 2020

Dear Prof. Zhang,

We are pleased to inform you that your manuscript 'LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Teresa M. Przytycka

Associate Editor

PLOS Computational Biology

William Noble

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors tried to implement suggestions, the paper is acceptable.

Reviewer #2: All my previous concerns have been addressed adequately. The inclusion of a null model as well as a practical use case (breast cancer subtype classification) strongly increases confidence in the significance of the extracted modules. A comparison against the graph clustering approach is also highly appreciated and strengthens the manuscript considerably.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Markus List

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007851.r004

Acceptance letter

Teresa M Przytycka, William Stafford Noble

10 Apr 2020

PCOMPBIOL-D-19-01970R1

LMSM: a modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Dear Dr Zhang,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Sarah Hammond

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data. The identified LMSM modules.

(XLSX)

Click here for additional data file.^{(171.9KB, xlsx)}

S2 Data. The distribution of the shared miRNAs in LMSM modules.

(XLSX)

Click here for additional data file.^{(43.7KB, xlsx)}

S3 Data. Statistically significant analysis results of LMSM modules.

(XLSX)

Click here for additional data file.^{(16.5KB, xlsx)}

S4 Data. BRCA-related genes and experimentally validated lncRNA related miRNA sponge interactions.

(XLSX)

Click here for additional data file.^{(78.7KB, xlsx)}

S5 Data. The enrichment scores of the identified LMSM modules in the BRCA subtype samples.

(XLSX)

Click here for additional data file.^{(484.9KB, xlsx)}

S6 Data. Classification analysis results of LMSM modules in classifying BRCA subtypes.

(XLSX)

Click here for additional data file.^{(13.2KB, xlsx)}

S7 Data. The number of shared miRNAs, lncRNAs, mRNAs, predicted interactions for each LMSM module.

(XLSX)

Click here for additional data file.^{(15.7KB, xlsx)}

S8 Data. The results of a graph clustering-based strategy.

(XLSX)

Click here for additional data file.^{(68.3KB, xlsx)}

S1 File. Supporting file.

Supplementary file.

(DOCX)

Click here for additional data file.^{(475.2KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(38.5KB, docx)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.

[pcbi.1007851.ref001] 1.Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2012; 22(9):1775–1789. 10.1101/gr.132159.111 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref002] 2.Fang Y, Fullwood MJ. Roles, functions, and mechanisms of long non-coding RNAs in cancer. Genomics Proteomics Bioinformatics 2016; 14(1):42–54. 10.1016/j.gpb.2015.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref003] 3.Bhan A, Soleimani M, Mandal SS. Long noncoding RNA and cancer: a new paradigm. Cancer Res 2017; 77(15):3965–81. 10.1158/0008-5472.CAN-16-2634 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref004] 4.Kopp F, Mendell JT. (2018) Functional classification and experimental dissection of long noncoding RNAs. Cell 2018; 172(3):393–407. 10.1016/j.cell.2018.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref005] 5.Ambros V. The functions of animal microRNAs. Nature 2004; 431(7006):350–5. 10.1038/nature02871 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref006] 6.Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004; 116(2):281–97. 10.1016/s0092-8674(04)00045-5 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref007] 7.Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 2011; 146(3):353–8. 10.1016/j.cell.2011.07.014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref008] 8.Tay Y, Rinn J, Pandolfi PP. The multilayered complexity of ceRNA crosstalk and competition. Nature 2014; 505(7483):344–52. 10.1038/nature12986 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref009] 9.Zhou S, He Y, Yang S, Hu J, Zhang Q, Chen W, et al. The regulatory roles of lncRNAs in the process of breast cancer invasion and metastasis. Biosci Rep 2018; 38(5):BSR20180772 10.1042/BSR20180772 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref010] 10.Peng F, Li TT, Wang KL, Xiao GQ, Wang JH, Zhao HD, et al. H19/let-7/LIN28 reciprocal negative regulatory circuit promotes breast cancer stem cell maintenance. Cell Death Dis 2017; 8(1):e2569 10.1038/cddis.2016.438 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref011] 11.Le TD, Zhang J, Liu L, Li J. Computational methods for identifying miRNA sponge interactions. Brief Bioinform 2017; 18(4):577–590. 10.1093/bib/bbw042 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref012] 12.Shao T, Wu A, Chen J, Chen H, Lu J, Bai J, et al. Identification of module biomarkers from the dysregulated ceRNA-ceRNA interaction network in lung adenocarcinoma. Mol Biosyst 2015; 11(11):3048–58. 10.1039/c5mb00364d [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref013] 13.Zhang Y, Xu Y, Feng L, Li F, Sun Z, Wu T, et al. Comprehensive characterization of lncRNA-mRNA related ceRNA network across 12 major cancers. Oncotarget 2016; 7(39):64148–67. 10.18632/oncotarget.11637 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref014] 14.Zhang J, Le TD, Liu L, Li J. Inferring miRNA sponge co-regulation of protein-protein interactions in human breast cancer. BMC Bioinformatics 2017; 18(1):243 10.1186/s12859-017-1672-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref015] 15.Wang H, Xu D, Huang H, Cui Y, Li C, Zhang C, et al. Detection of dysregulated competing endogenous RNA modules associated with clear cell kidney carcinoma. Mol Med Rep 2018; 18(2):1963–72. 10.3892/mmr.2018.9189 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref016] 16.Do D, Bozdag S. Cancerin: A computational pipeline to infer cancer-associated ceRNA interaction networks. PLoS Comput Biol 2018; 14(7):e1006318 10.1371/journal.pcbi.1006318 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref017] 17.Zhang J, Liu L, Li J, Le TD. LncmiRSRN: identification and analysis of long non-coding RNA related miRNA sponge regulatory network in human cancer. Bioinformatics 2018; 34(24):4232–40. 10.1093/bioinformatics/bty525 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref018] 18.Deng J, Kong W, Wang S, Mou X, Zeng W. Prior knowledge driven joint NMF algorithm for ceRNA co-module identification. Int J Biol Sci 2018; 14(13):1822–1833. 10.7150/ijbs.27555 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref019] 19.Xiao Q, Luo J, Liang C, Cai J, Li G, Cao B. CeModule: an integrative framework for discovering regulatory patterns from genomic data in cancer. BMC Bioinformatics 2019; 20(1):67 10.1186/s12859-019-2654-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref020] 20.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008; 9:559 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref021] 21.Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003; 34(2):166–76. 10.1038/ng1165 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref022] 22.Hotelling H. Relations between two sets of variates. Biometrika 1936; 28(3/4):321–377. 10.2307/2333955 [DOI] [Google Scholar]

[pcbi.1007851.ref023] 23.Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 2009; 10(3):515–34. 10.1093/biostatistics/kxp008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref024] 24.Paci P, Colombo T, Farina L. Computational analysis identifies a sponge interaction network between long non-coding RNAs and messenger RNAs in human breast cancer. BMC Syst Biol 2014; 8:83 10.1186/1752-0509-8-83 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref025] 25.List M, Dehghani Amirabad A, Kostka D, Schulz MH. Large-scale inference of competing endogenous RNA networks with sparse partial correlation. Bioinformatics 2019; 35(14):i596–i604. 10.1093/bioinformatics/btz314 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref026] 26.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995; 57(1): 289–300. 10.2307/2346101 [DOI] [Google Scholar]

[pcbi.1007851.ref027] 27.Andersen P, Gill R. Cox's regression model for counting processes, a large sample study. Ann Stat 1982; 10(4):1100–20. 10.1214/aos/1176345976 [DOI] [Google Scholar]

[pcbi.1007851.ref028] 28.Therneau TM, Grambsch PM. Modeling survival data: extending the Cox model. Springer-Verlag, New York, 2000. [Google Scholar]

[pcbi.1007851.ref029] 29.Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 2009; 27(8):1160–7. 10.1200/JCO.2008.18.1370 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref030] 30.Gendoo DM, Ratanasirigulchai N, Schröder MS, Paré L, Parker JS, Prat A, et al. Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 2016; 32(7):1097–9. 10.1093/bioinformatics/btv693 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref031] 31.Hänzelmann S. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013; 14:7 10.1186/1471-2105-14-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref032] 32.Welch BL. The generalisation of student's problems when several different population variances are involved. Biometrika 1947; 34(1–2):28–35. 10.1093/biomet/34.1-2.28 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref033] 33.Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-Label Data In Maimon O. and Rokach L., editors, Data Mining and Knowledge Discovery Handbook, chapter 34, pages 667–685. Springer-Verlag, 2 edition, 2010. ISBN 0387244352. 10.1007/978-0-387-09823-4_34 [DOI] [Google Scholar]

[pcbi.1007851.ref034] 34.Rivolli A, de Carvalho AC. The utiml package: Multi-label classification in R. The R Journal 2018; 10(2): 24–37. 10.32614/RJ-2018-041 [DOI] [Google Scholar]

[pcbi.1007851.ref035] 35.Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM T Intel Syst Tec. 2011; 2(3):1–27. 10.1145/1961189.1961199 [DOI] [Google Scholar]

[pcbi.1007851.ref036] 36.Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien, R package version, 1.7–3. 2019. https://CRAN.R-project.org/package=e1071.

[pcbi.1007851.ref037] 37.Metz J, de Abreu LF, Cherman EA, Monard MC. On the estimation of predictive evaluation measure baselines for multi-label learning. In 13th Ibero-American Conference on AI, pages 189–198, Cartagena de Indias, Colombia, 2012. 10.1007/978-3-642-34654-5_20. [DOI]

[pcbi.1007851.ref038] 38.Hao Y, Wu W, Li H, Yuan J, Luo J, Zhao Y, et al. NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford) 2016. 10.1093/database/baw057 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref039] 39.Paraskevopoulou MD, Vlachos IS, Karagkouni D, Georgakilas G, Kanellos I, Vergoulis T, et al. DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res 2016; 44(Database issue):D231–D238. 10.1093/nar/gkv1270 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref040] 40.Huang HY, Lin YC, Li J, Huang KY, Shrestha S, Hong HC, et al. miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res 2019. 10.1093/nar/gkz896 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref041] 41.Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic Acids Res 2015; 43(Database issue):D153–D159. 10.1093/nar/gku1215 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref042] 42.Dweep H, Gretz N. miRWalk2.0: a comprehensive atlas of microRNA-target interactions. Nat Methods 2015; 12(8):697 10.1038/nmeth.3485 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref043] 43.Piñero J, Bravo À, Queralt-Rosinach N, Gutiérrez-Sacristán A, Deu-Pons J, Centeno E, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 2017; 45(Database issue):D833–D839. 10.1093/nar/gkw943 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref044] 44.Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res 2017; 45(Database issue):D777–D783. 10.1093/nar/gkw1121 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref045] 45.Bao Z, Yang Z, Huang Z, Zhou Y, Cui Q, Dong D. LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res 2019; 47(Database issue):D1034–D1037. 10.1093/nar/gky905 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref046] 46.Gao Y, Wang P, Wang Y, Ma X, Zhi H, Zhou D, et al. Lnc2Cancer v2.0: updated database of experimentally supported long non-coding RNAs in human cancers. Nucleic Acids Res 2019; 47(Database issue):D1028–D1033. 10.1093/nar/gky1096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref047] 47.Cui T, Zhang L, Huang Y, Yi Y, Tan P, Zhao Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res 2018; 46(Database issue):D371–D374. 10.1093/nar/gkx1025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref048] 48.Wang P, Zhi H, Zhang Y, Liu Y, Zhang J, Gao Y, et al. miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs. Database (Oxford) 2015. 10.1093/database/bav098 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref049] 49.Pian C, Zhang G, Tu T, Ma X, Li F. LncCeRBase: a database of experimentally validated human competing endogenous long non-coding RNAs. Database (Oxford) 2018. 10.1093/database/bay061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref050] 50.Wang P, Li X, Gao Y, Guo Q, Wang Y, Fang Y, et al. LncACTdb 2.0: an updated database of experimentally supported ceRNA interactions curated from low- and high-throughput experiments. Nucleic Acids Res 2019; 47(Database issue):D121–D127. 10.1093/nar/gky1144 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref051] 51.Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife 2015; 4 10.7554/eLife.05005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref052] 52.Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, et al. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Res 2013; 41(Web Server issue):W169–73. 10.1093/nar/gkt393 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref053] 53.Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2014; 42(Database issue):D92–7. 10.1093/nar/gkt1248 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref054] 54.Sticht C, De La Torre C, Parveen A, Gretz N. miRWalk: An online resource for prediction of microRNA binding sites. PLoS One 2018; 13(10):e0206239 10.1371/journal.pone.0206239 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref055] 55.Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 2017; 33(18):2938–2940. 10.1093/bioinformatics/btx364 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref056] 56.Zhang J, Liu L, Xu T, Xie Y, Zhao C, Li J, et al. miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules. BMC Bioinformatics 2019; 20(1):235 10.1186/s12859-019-2861-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref057] 57.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002; 30(7):1575–1584. 10.1093/nar/30.7.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pcbi.1007851.ref058] 58.Bunte K, Leppäaho E, Saarinen I, Kaski S. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics 2016; 32(16):2457–63. 10.1093/bioinformatics/btw207 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref059] 59.Klami A, Virtanen S, Leppäaho E, Kaski S. Group factor analysis. IEEE Trans Neural Netw Learn Syst 2015; 26(9):2136–2147. 10.1109/TNNLS.2014.2376974 [DOI] [PubMed] [Google Scholar]

[pcbi.1007851.ref060] 60.Suvitaival T, Parkkinen JA, Virtanen S, Kaski S. Cross-organism toxicogenomics with group factor analysis. Syst Biomed 2014; 2(4):71–80. 10.4161/sysb.29291 [DOI] [Google Scholar]

[pcbi.1007851.ref061] 61.Virtanen S, Klami A, Khan S, Kaski S. Bayesian group factor analysis. In: Lawrence,N. and Girolami,M. (eds), Proc. of the 15th International Conference on Artificial Intelligence and Statistics, 2012; pp. 1269–1277.

[pcbi.1007851.ref062] 62.Wang P, Ning S, Zhang Y, Li R, Ye J, Zhao Z, et al. Identification of lncRNA-associated competing triplets reveals global patterns and prognostic markers for cancer. Nucleic Acids Res 2015; 43(7):3478–89. 10.1093/nar/gkv233 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Junpeng Zhang

Taosheng Xu

Lin Liu

Wu Zhang

Chunwen Zhao

Sijing Li

Jiuyong Li

Nini Rao

Thuc Duy Le

Roles

Abstract

Author summary

Introduction

Materials and methods

A hypothesis on miRNA sponge modular competition

Fig 1. An illustration of the miRNA sponge modular competition hypothesis.

The LMSM framework

Overview of LMSM

Fig 2. Workflow of LMSM.

Identifying lncRNA-mRNA co-expression modules

Inferring lncRNA related miRNA sponge modules

Evaluating statistical significance of LMSM modules

Application of LMSM in BRCA

BRCA enrichment analysis

Module biomarker identification in BRCA

Identification of BRCA subtype-specific modules

Performance of LMSM modules in classifying BRCA subtypes

Results

Heterogeneous data sources

Most of the mediating miRNAs act as crosslinks across LMSM modules

LMSM modules are all statistically significant

Most of LMSM modules are implicated in BRCA

Table 1. BRCA-related LMSM modules.

Table 2. Survival analysis of LMSM modules in BRCA.

LMSM modules are mostly BRCA subtype-specific

Fig 3. Heatmap of the enrichment scores of BRCA subtype-specific LMSM modules in five BRCA subtype samples.

The performance of LMSM modules is significantly higher than baseline’s performance in classifying BRCA subtypes

Several lncRNA-related miRNA sponge interactions are experimentally confirmed

Table 3. Validated lncRNA-related miRNA sponge interactions.

LMSM is capable of predicting miRNA targets

Fig 4. Overlaps and differences between predicted miRNA-target interactions by LMSM and other methods.

Comparison with graph clustering-based strategy

Table 4. Comparison results between LMSM and GC.

LMSM is robust

Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Teresa M Przytycka

William Stafford Noble

Roles

Author response to Decision Letter 0

Decision Letter 1

Teresa M Przytycka

William Stafford Noble

Roles

Acceptance letter

Teresa M Przytycka

William Stafford Noble

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases