Abstract
MicroRNAs (miRNAs) have been shown to be closely related to cancer progression. Traditional methods for discovering cancer-related miRNAs mostly require significant marginal differential expression, but some cancer-related miRNAs may be non-differentially or only weakly differentially expressed. Such miRNAs are called dark matters miRNAs (DM-miRNAs) and are targeted through the Pearson correlation change on miRNA-target interactions (MTIs), but the efficiency of their method heavily relies on restrictive assumptions. In this paper, a novel method was developed to discover DM-miRNAs using support vector machine (SVM) based on not only the miRNA expression data but also the expression of its regulating target. The application of the new method in breast and kidney cancer datasets found, respectively, 9 and 24 potential DM-miRNAs that cannot be detected by previous methods. Eight and 15 of the newly discovered miRNAs have been found to be associated with breast and kidney cancers, respectively, in existing literature. These results indicate that our new method is more effective in discovering cancer-related miRNAs.
Keywords: cancer-related miRNAs, support vector machine, dark matters, miRNA-target interactions, expression data
Introduction
MicroRNAs (miRNAs) represent a type of small non-coding RNA molecule with about 22 nucleotides found in plants, animals, and viruses that function in post-transcriptional regulation of gene expression and RNA silencing by binding to the 3′ untranslated regions of mRNA.1, 2, 3, 4 miRNAs are abundant in many mammalian cells5,6 and appear to target about 60% of the genes of mammals.7,8 Many miRNAs are evolutionarily conserved, which indicates that they have significant biological functions.9 Research suggests that miRNAs can act as regulators of diverse cellular processes, such as cell differentiation, apoptosis, virus defense, embryonic development, and proliferation.10,11 Furthermore, miRNAs have been implicated in many diseases, such as various types of cancers,12, 13, 14 heart conditions,15 and neurological diseases.16 Up to now, miRNAs have been studied as promising candidates for diagnostic and prognostic biomarkers, as well as predictors of drug responses. For example, miR-1246 is a potential diagnostic and prognostic biomarker in esophageal squamous cell carcinoma (ESCC), and may act as a cell adhesion-related miRNA released from ESCC that affects distant organs.17 Research shows that single-nucleotide polymorphisms (SNPs) in miRNAs and their target sites can impact miRNA biology and affect cancer risk, as well as treatment response.18 It is likely that these SNPs can act as diagnostic and prognostic markers. Thus, discovering pivotal cancer-related miRNAs is an active area of research.
The differential expression analysis (DE), which performs two groups comparison for individual miRNA followed by certain multiple comparison correction, may be the most common method of discovering cancer-related miRNAs. For example, in Zhou et al.,19 differentially expressed miRNAs and mRNAs were separately selected as biomarkers using the limma package; in Liao et al.,20 5 miRNAs of 320 differentially expressed mRNAs were used for prognostic signature construction; in Le et al.,21 a causality discovery-based method was used to uncover the causal regulatory relationship between miRNAs and mRNAs. However, some non-differentially or weak differentially expressed miRNAs may play important regulatory roles in cancer. Pian et al.22 named this type of miRNA “dark matters” miRNA (DM-miRNA) and developed a method to discover DM-miRNA based on the change of Pearson correlation coefficient (ΔPCC). However, ΔPCC may fail in some situations. For example, if the correlations between a miRNA and its target in cancer and normal samples are consistent as in Figure 1A, ΔPCC will be too small to discover this MTI. Also, ΔPCC is based on Pearson correlation, which cannot detect nonlinear associations, such as in Figure 1B.
Figure 1.
Two Situations that ΔPCC Has Difficulty Handling
Points of two colors represent samples from the normal and cancer groups. (A) Consistent correction through embedding. (B) Nonlinear association.
Here, we introduce a machine learning method to discover cancer-related miRNAs. More specifically, support vector machines (SVMs) are used to construct nonlinear class separation boundaries in the two-dimensional space of a miRNA and its experimentally validated target. By focusing on experimentally validated miRNA-target interactions (MTIs), we can avoid many false positives as compared with the DE method on marginal expression. With the ability of SVMs to induce complex decision boundaries, we can accommodate nonlinear or even embedded class relationships as in Figure 1. The classification accuracy (ACC, see definition in Materials and Methods) is used to screen signals and compare different approaches.
Results
Results for Breast Cancer
miRNAs with High Classification Accuracy (S1)
We use the breast cancer expression data of each miRNA as the input feature to train an SVM classifier. Figure 3A shows the miRNAs whose ACC is greater than 0.8. The miRNAs in the red rectangular boxes are not experimentally confirmed to be associated with breast invasive carcinoma (BRCA). The remaining miRNAs have been shown to be associated with breast cancer based on the database HMDD 2.0 and literature mining. The PubMed numbers of these miRNAs are shown in Table 1. Figure 2B is the volcano map of miRNAs in Figure 2A. We find that most of these miRNAs are not differentially expressed. The results indicate that the SVM based on miRNA expression data alone can discover partial BRCA-related miRNAs.
Figure 3.
The 2,028 mRNAs Whose ACCs Are Greater Than 0.8 for Breast Cancer
(A) The volcano map of 2,028 mRNAs. Red and blue represent downregulation and upregulation, respectively. (B) The enrichment analyses result of the above 2,028 mRNA genes based on KEGG pathways for BRCA.
Table 1.
The Literature Reports of the Associations between the miRNAs with ACC >0.8 on miRNA Expression and Breast Cancer
miRNA | PubMed No. |
---|---|
miR-139 | 21953071 |
miR-21 | 17531469 |
miR-183 | 23060431 |
miR-145 | 21723890 |
miR-99a | 27212167 |
miR-10b | 22573479 |
miR-96 | 19574223 |
miR-141 | 18376396 |
let-7c | 22388088 |
miR-125b-1 | 19738052 |
miR-204 | 18922924 |
miR-182 | 19574223 |
miR-100 | 22926517 |
miR-592 | 29039599 |
miR-429 | 18376396 |
miR-200a | 20514023 |
miR-125b-2 | 20460378 |
miR-206 | 17312270 |
miR-337 | unknown |
miR-486 | 19946373 |
miR-15b | 25783158 |
miR-551b | unknown |
miR-181b-1 | 23759567 |
miR-383 | 16754881 |
miR-32 | 26276160 |
miR-584 | 23479725 |
miR-133a-1 | 22292984 |
miR-585 | 22328513 |
miR-195 | 30076862 |
miR-200b | 20514023 |
miR-133b | 19946373 |
miR-934 | unknown |
Figure 2.
The 32 miRNAs with ACC >0.8 in Breast Cancer
(A) The relationship between the 32 miRNAs and breast cancer. The miRNAs in the red rectangular boxes are so far not experimentally confirmed to be associated with BRCA. (B) The volcano map of the above 32 miRNAs. Most of these miRNAs are not differentially expressed.
miRNAs with High Classification Accuracy (S2)
We also use the breast cancer expression data of each mRNA as the input feature to train an SVM classifier. Figure 3A describes the DE results of 2,028 mRNAs whose ACCs are greater than 0.8. In addition, the enrichment analyses results are shown in Figure 3B. DAVID23,24 is employed for enrichment analyses for the above 2,028 mRNAs based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Some cancer mechanism-related pathways (such as pathways in cancer and the p53 signaling pathway, prostate cancer, miRNAs in cancer, pancreatic cancer, chronic myeloid leukemia, melanoma, the p53 signaling pathway, small cell lung cancer, colorectal cancer) are significantly enriched. These results indicate that the discovered mRNAs are very important in cancers.
MTIs with High Classification Accuracy (S3)
For each of the 155,044 experimentally verified human MTIs from the miRTarBase database, we use the mRNA and miRNA breast cancer expression data of the miRNA-mRNA interaction as the two features of SVM. The MTIs with high ACC >0.8 are selected as candidate MTIs for discovering cancer-related miRNAs.
Discovery of DM-miRNAs in Breast Cancer
To demonstrate why our new method can catch better discriminant information, we analyze the MTIs with ACC >0.9 in the miRNA-mRNA joint space, whereas the corresponding marginal ACC of both the miRNA and the mRNA are <0.8. There are 136 MTIs satisfying the above conditions (Table S2). Thus, although the ACCs based on the marginal miRNA feature and the marginal mRNA feature are both nonideal, the performance of classification of the corresponding MTI, i.e., the joint feature, is significant. Figure 4A shows the 31 miRNAs in 136 MTIs. The miRNAs in the red rectangular boxes are so far not experimentally confirmed to be associated with BRCA. The PubMed numbers of these miRNAs are shown in Table 2. We see that most of these 31 miRNAs are related to BRCA and non-differentially expressed in Figure 4B. There are two differentially expressed miRNAs. Figure 4C represents the expression of miR-452 and IRS1 in normal and cancer samples. We find that it is hard to distinguish the normal and cancer samples based only on the feature of single miRNA or only on the mRNA expression profile data. More specifically, the classification accuracy of using miR-452 or IRS1 alone is 69.61% or 62.55%, respectively. Figure 4D is the scatterplot of miR-452 and IRS1. Compared with the classification performance of either marginal feature miR-452 or IRS1, the detection using the two-dimensional features of miR-452 and IRS1 is much more effective.
Figure 4.
The 31 miRNAs in 136 MTIs with [ACC(miRNA-mRNA) > 0.9, ACC(miRNA) < 0.8, ACC(mRNA) < 0.8] for Breast Cancer
(A) The relationship between these miRNAs and cancer. The miRNAs in the red rectangular boxes are not experimentally confirmed to be associated with BRCA. (B) The volcano map of the above 31 miRNAs. Only 2 of 31miRNAs are differentially expressed. (C) The one-dimensional scatterplot of single miR-452 and IRS1 expression values in normal and cancer samples. The left two lines represent the expression value of miR-452 in BRCA and normal tissues, and the right two lines represent the expression value of IRS1 in BRCA and normal tissues. (D) The two-dimensional scatterplot of miRNA-mRNA interaction. The abscissa and ordinate represent the expression values of IRS1 and miR-452.
Table 2.
The Literature Reports of the Associations between DM-miRNAs and Breast Cancer
miRNA | PubMed No. |
---|---|
miR-28 | unknown |
miR-200c | 21224848 |
miR-497 | 27456360 |
miR-335 | 28795314 |
miR-483 | 30186493 |
miR-140 | 23752191 |
miR-1247 | 30249392 |
miR-378c | 26749280 |
miR-144 | 29561704 |
miR-10a | 21955614 |
miR-148b | 23233531 |
miR-1468 | unknown |
miR-193a | 22333974 |
miR-190b | 26141719 |
miR-454 | 27588500 |
miR-340 | 21692045 |
miR-93 | 21955614 |
miR-224 | 22809510 |
miR-296 | 19754881 |
miR-452 | 22353773 |
miR-1301 | 29790898 |
miR-210 | 22952344 |
miR-590 | 29534690 |
miR-130b | 28163094 |
miR-130a | 29384218 |
miR-301b | 21393507 |
miR-98 | 28232182 |
let-7i | 22388088 |
miR-142 | 26657485 |
miR-30a | 22231442 |
miR-421 | 28463794 |
The underlined miRNAs are experimentally confirmed.
If we relax the thresholds in the previous paragraph by analyzing the MTIs with ACC >0.8 in the joint feature and ACC <0.7 in both marginal features, the results are shown in Table 3. The underlined miRNAs are experimentally confirmed to be associated with BRCA. The second and third columns are the fold change (FC) and PubMed numbers of literature reports of these miRNAs, respectively. Most of these miRNAs are not differentially expressed.
Table 3.
The FC and Literature Reports of miRNA [ACC(miRNA-mRNA) > 0.8, ACC(miRNA) < 0.7, ACC(miRNA) < 0.7)] for Breast Cancer
miRNA | FC | PubMed No. |
---|---|---|
miR-30a | 0.065 | 22476851 |
miR-331 | 0.343 | 30063890 |
miR-23b | 0.015 | 22231442 |
miR-17 | 0.091 | 18695042 |
miR-92a-2 | 0.036 | 22563438 |
miR-449a | 3.004 | 27983918 |
miR-134 | 0.095 | 28454346 |
let-7b | 0.035 | 22403704 |
miR-127 | 0.080 | 21409395 |
miR-3127 | 0.507 | unknown |
miR-20a | 0.018 | 22350790 |
miR-30c-2 | 0.070 | 23340433 |
miR-421 | 0.627 | 28463794 |
miR-125a | 0.052 | 23420759 |
miR-186 | 0.048 | unknown |
miR-877 | 1.131 | unknown |
miR-222 | 0.062 | 21553120 |
miR-330 | 0.234 | 29630118 |
The underlined miRNAs are experimentally confirmed.
In summary, compared with the single miRNA or mRNA, paired MTIs contain more biological information. Therefore, the SVM classifier based on the paired miRNA-mRNA features can effectively discover more DM-miRNAs.
We draw receiver operating characteristic (ROC) curves by randomly selecting six MTIs with ACC >0.9 [ACC(miRNA) < 0.8, ACC(mRNA) < 0.8]. Figure 5 shows the classification performance based on the single mRNA, miRNA, and paired MTIs for BRCA. The results indicate that the information of MTIs is more effective. The classification ability of MTIs is significantly better than that of mRNAs and miRNAs. Therefore, MTIs can be effective biomarkers that contain more biological information.
Figure 5.
The ROC Curves of Six MTIs with ACC >0.9 for Breast Cancer
The classification results of miR-452-IRS1, miR-98-PNRC1, miR-98-BCL9, miR-1301-CDCA4, miR-130a-TRIM59, and miR-130b-SMOC1. The black and red lines represent the ROC curve based on the single miRNA and mRNA, respectively. The green line represents the ROC curve based on the paired miRNA-mRNA interaction.
Comparison with DE of miRNAs
In order to show that SVM can effectively screen potential cancer-related miRNAs, we compared the results of SVM and DE. Table 4 records the top 20 |log2(FC)| miRNAs in breast cancer based on the DE. The results in Table 5 indicate that only 4 of the top 20 miRNAs were confirmed to be associated with breast cancer. The underlined miRNAs are experimentally confirmed to be associated with BRCA. However, Table 2 shows that 19 of the top 20 ACC miRNAs were confirmed to be associated with breast cancer, which indicates that using SVM to select cancer-related miRNAs is more effective.
Table 4.
The Top 20 |log2(FC)| miRNAs in Breast Cancer
miRNA | |log2(FC)| | PubMed No.a |
---|---|---|
miR-802 | 5.412 | 26080894 |
miR-449c | 4.186 | unknown |
miR-3927 | 4.764 | unknown |
miR-3139 | 4.608 | unknown |
miR-124-2 | 4.458 | unknown |
miR-492 | 4.324 | 25407488 |
miR-573 | 4.253 | 25333258 |
miR-1908 | 4.253 | unknown |
miR-549 | 4.084 | unknown |
miR-3156-2 | 4.034 | unknown |
miR-3156-1 | 4.034 | unknown |
miR-507 | 4.031 | 27167339 |
miR-3180 | 4.017 | unknown |
miR-3612 | 3.982 | unknown |
miR-3925 | 3.829 | unknown |
miR-1302-3 | 3.677 | unknown |
miR-449b | 3.580 | unknown |
miR-3156-3 | 3.569 | unknown |
miR-3148 | 3.568 | unknown |
miR-592 | 3.349 | unknown |
The underlined miRNAs are experimentally confirmed to be associated with BRCA.
The third column represents the PubMed number of literature reports of these miRNAs.
Table 5.
The Literature Reports of the Associations between DM-miRNAs and Kidney Cancer
miRNA | PubMed No. |
---|---|
let-7b | 28694731 |
let-7g | 25951903 |
let-7i | 28694731 |
mir-100 | 28765937 |
mir-154 | 30138594 |
mir-15b | unknown |
mir-183 | 26091793 |
mir-186 | 28550686 |
mir-20b | 26708577 |
mir-214 | 27226530 |
mir-216b | 30231239 |
mir-23b | 20562915 |
mir-26a-1 | 28881158 |
mir-30b | 28536082 |
mir-320a | 27760486 |
mir-335 | 29070041 |
mir-340 | unknown |
mir-369 | unknown |
mir-377 | 25776481 |
mir-483 | unknown |
mir-493 | unknown |
mir-513c | unknown |
mir-625 | unknown |
mir-675 | unknown |
The underlined miRNAs are experimentally confirmed to be associated with kidney cancer.
Results for Kidney Cancer
For comparison with the previous method ΔPCC, we show the results for kidney cancer. As before, we analyze MTIs with ACC >0.9 and whose single miRNA and mRNA have ACC <0.8. A total of 76 such MTIs are selected (Table S3). Table 5 describes the mRNAs in these 76 MTIs. The underlined miRNAs are experimentally confirmed to be associated with kidney cancer. The PubMed numbers of these miRNAs are shown in the second and fourth columns of Table 5.
We also compare the results of SVM and DE in kidney renal clear cell carcinoma (KIRC). Table 6 records the top 20 |log2(FC)| miRNAs in kidney cancer based on DE. Table 7 records the top 20 ACC miRNAs in kidney cancer based on SVM classifier. The underlined miRNAs are experimentally confirmed to be associated with KIRC. Results in Table 6 indicate that only 3 of the top 20 miRNAs were confirmed to be associated with kidney cancer. However, Table 7 shows that 16 of the top 20 ACC miRNAs were confirmed to be associated with kidney cancer. These results also indicate that using SVM to select cancer-related miRNAs is more effective.
Table 6.
The Top 20 |log2(FC)| miRNAs in Kidney Cancer
miRNA | |log2(FC)| | PubMed No.a |
---|---|---|
miR-1293 | 5.143 | 28338236 |
miR-122 | 5.007 | 23056576 |
miR-875 | 4.582 | unknown |
miR-3166 | 4.523 | unknown |
miR-3202-2 | 4.431 | unknown |
miR-1285-1 | 4.108 | 22294552 |
miR-1231 | 3.869 | unknown |
miR-1250 | 3.832 | unknown |
miR-520b | 3.788 | unknown |
miR-518c | 3.777 | unknown |
miR-3654 | 3.775 | unknown |
miR-219-2 | 3.704 | unknown |
miR-2115 | 3.602 | unknown |
miR-3617 | 3.484 | unknown |
miR-555 | 3.434 | unknown |
miR-548d-2 | 3.413 | unknown |
miR-3662 | 3.302 | unknown |
miR-1910 | 3.289 | unknown |
miR-597 | 3.278 | unknown |
miR-3941 | 3.199 | unknown |
The underlined miRNAs are experimentally confirmed to be associated with KIRC.
The third column represents the PubMed number of literature reports of these miRNAs.
Table 7.
The Top 20 ACC miRNAs in Kidney Cancer
miRNA | ACC (%) | PubMed No.a |
---|---|---|
miR-200c | 98.75 | 29394133 |
miR-141 | 98.51 | 24647573 |
miR-206 | 95.53 | 29410711 |
miR-122 | 94.28 | 29410711 |
miR-129-1 | 94.10 | 24802708 |
miR-129-2 | 93.75 | 28251969 |
miR-629 | 93.21 | 25381221 |
miR-584 | 92.86 | 21119662 |
miR-891a | 92.68 | unknown |
miR-106b | 91.96 | 28423523 |
miR-210 | 91.96 | 29445446 |
miR-181b-1 | 91.43 | unknown |
miR-15a | 90.89 | 28849086 |
miR-934 | 90.54 | unknown |
miR-21 | 90.53 | 29131259 |
miR-429 | 90.35 | 27698878 |
miR-151 | 90.00 | unknown |
miR-181a-1 | 89.82 | 29066014 |
miR-155 | 89.64 | 29228417 |
miR-25 | 89.64 | 29079415 |
The underlined miRNAs are experimentally confirmed to be associated with KIRC.
The third column represents the PubMed number of literature reports of these miRNAs.
Identification of Cancer Types via miRNA-mRNA Association
To verify whether miRNA-mRNA associations can effectively classify cancer types, we designed a multiclass classifier with multiple SVM sub-classifiers to identify the six cancers and the normal tissues. The miRNA-mRNA pairs with joint ACC >0.8 but marginal ACC <0.7 were selected as the features of the classifiers. The detailed flow chart is in Figure 6. The index “1–6” represents the six kinds of cancer (lung squamous cell carcinoma [LUSC], lung adenocarcinoma [LUAD], BRCA, thyroid carcinoma [THCA], prostate adenocarcinoma [PRAD], KIRC), respectively. The index “7” represents the integration of paired normal tissue samples. We divided these seven classes into two subclasses. Further subclasses are further divided into two subclasses, which are so circulated until a single class is obtained. Finally, we evaluated the performance of the classifier using 10-fold cross-validation. The accuracies of the seven classes are shown in Table 8. The diagonal elements are the percentages of real LUSC, LUAD, BRCA, THCA, PRAD, KIRC, and normal samples identified correctly. The remaining elements are the percentage of a class of samples judged to be the six types of samples. The results indicate that the miRNA-mRNA associations can be used to precisely identify cancer types.
Figure 6.
The Flow Chart for Constructing the Multiclass Classifier
The numbers 1–6 represent LUSC, LUAD, BRCA, THCA, PRAD, and KIRC, respectively. The number 7 represents the normal tissue samples. The process contains six SVM classifiers. For sample S1, where the type of cancer is not known, if S1 is classified as “1,2,3” using SVM1, then we use SVM2 to judge its type. If S1 is classified as “3,” the final prediction type is BRCA, otherwise S1 needs to be further predicted through SVM4.
Table 8.
The Performance of the Multiclass Classifier by Using 10-Fold Cross-Validation
LUSC | LUAD | BRCA | THCA | PRAD | KIRC | Normal | |
---|---|---|---|---|---|---|---|
LUSC | 97.28 | 0.81 | 0.29 | 0.16 | 0.28 | 0.54 | 0.64 |
LUAD | 1.83 | 96.22 | 0.52 | 0.35 | 0.41 | 0.36 | 0.31 |
BRCA | 0.12 | 0.29 | 97.16 | 0.84 | 042 | 0.58 | 0.59 |
THCA | 0.34 | 0.47 | 0.46 | 97.38 | 0.73 | 0.39 | 0.23 |
PRAD | 0.26 | 0.38 | 0.41 | 0.46 | 97.14 | 0.68 | 0.67 |
KIRC | 0.52 | 0.32 | 0.37 | 0.43 | 0.46 | 97.42 | 0.48 |
Normal | 0.24 | 0.33 | 0.34 | 0.28 | 0.52 | 0.15 | 98.14 |
Comparison with Other Methods
Pian et al.22 provided a method called ΔPCC to discover potential DM-miRNAs by building the basic miRNA-mRNA network (BMMN) and miRNA-long noncoding RNA (lncRNA) network (BMLN). For breast cancer, 124 miRNAs with high activity scores were obtained by BMMN. In this paper, we obtained 49 miRNAs by integrating Tables 2 and 3. Through comparing these 124 and 49 miRNAs, we found that 9 of 49 miRNAs (hsa-miR-331, hsa-miR-142, hsa-miR-3127, hsa-miR-222, hsa-miR-378c, hsa-miR-92a-2, hsa-miR-421, hsa-miR-125a, and hsa-miR-590) did not appear in the 124 miRNAs. Tables 2 and 3 show that all nine of the above miRNAs except hsa-miR-3127 have been confirmed to be associated with breast cancer. For kidney cancer, 70 miRNAs with high activity score were obtained by BMMN. Only one (miR-let-7b) of the 24 miRNAs in Table 5 appears in the above 70 miRNAs. Fifteen of the remaining 23 miRNAs have been confirmed to be associated with kidney cancer. The above results indicate that our new method can find cancer-related miRNAs that cannot be discovered by ΔPCC.
Discussion
Cancers have a high incidence of occurrence globally. Their high mortality rates highlight the urgent need for new treatment methods. miRNAs are important post-transcriptional gene expression regulators. In cancer, the miRNAs aberrantly expressed have significant roles in progression and tumorigenesis. Currently, miRNAs are being studied as biomarkers for diagnosis and prognosis, and as therapeutic tools in cancer. However, some important miRNAs are easily overlooked, when the correlations between these miRNAs and their target genes in cancer and normal samples are consistent. In order to discover these miRNAs, we use a novel method to discover them by building SVM classifiers based on potential joint MTIs. Our results indicate that the new method can detect additional cancer-related miRNAs that cannot be detected by previous methods. Our new method should be considered complementary to previous methods. We also find that the edge biomarkers contain more biological information than the node biomarkers. Compared with the signal miRNA or mRNA biomarkers, edge biomarkers (paired miRNA-mRNA interaction) can more effectively distinguish tumor samples and normal samples. Furthermore, by constructing a classifier with multiple random forest sub-classifiers based on the edge biomarkers, the six cancers can be identified accurately. This will provide a new way to further study the classification of tumor sub-types. In conclusion, our method can help effectively discover new cancer-related miRNAs. These results will contribute to developing novel therapeutic candidates in cancers.
Our method also has some limitations. For example, our method is based on the known MTIs from miRTarBase;25 thus, it cannot detect newly gained MTIs that have not been recorded in miRTarBase. To remedy this potential loss, a systematic scan of all miRNA-mRNA pairs may be needed, which will be very computationally costly.
Materials and Methods
Datasets
We studied different types of cancer, including BRCA, KIRC, LUAD, LUSC, THCA, and prostate adenocarcinoma (PRAD). The expression profiles of these six cancers were downloaded from the database of The Cancer Genome Atlas (TCGA) (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga), which includes 1,071 miRNAs and 20,530 mRNAs. The number of cancer samples is shown in Table 9. The 155,044 experimentally validated MTIs (Table S1) and miRNA-disease associations were obtained from the databases miRTarBase and HMDD v.2.0, respectively.25,26
Table 9.
The Type and Sample Number of Six Different Types of Cancer
Cancer Abbreviation | Full Name of Cancer | No. of Cancer Tissue Samples | No. of Paired Normal Tissue Samples |
---|---|---|---|
BRCA | breast invasive carcinoma | 755 | 86 |
KIRC | kidney renal clear cell carcinoma | 255 | 71 |
THCA | thyroid carcinoma | 511 | 59 |
LUAD | lung adenocarcinoma | 445 | 19 |
LUSC | lung squamous cell carcinoma | 342 | 38 |
PRAD | prostate adenocarcinoma | 494 | 52 |
Flow Chart of the Method
The workflow of DM-miRNA discovery is divided into four steps (Figure 7). First, an SVM classifier is constructed for each of the 1,071 miRNAs based on its expression data in cancer and normal tissues. Therefore, the classification accuracy (ACC) based on each miRNA expression feature is obtained. We select miRNAs with high ACC as set S1. In step 2, likewise, ACC based on each mRNA expression feature is calculated by building 20,530 SVM classifiers. The mRNAs with high ACC are selected as set S2. In step 3, ACCs based on 155,044 paired miRNA-mRNA expression features are also obtained by building 155,044 SVM classifiers. We select paired miRNA-mRNA interactions with high ACC as set S3. Finally, we obtain potential DM-miRNAs by removing the MTIs of S3, which contain miRNAs of S1 or mRNAs of S2.
Figure 7.
The Flow Chart of Our Method
The green modules represent the SVM classification results based on the miRNA expression feature. The miRNAs with high ACC are selected as set S1. The orange modules represent the SVM classification results based on the mRNA expression feature. The mRNAs with high ACC are selected as set S2. The blue modules represent the SVM classification results based on the paired MTIs feature. We select paired MTIs with high ACC as set S3. DM-miRNAs are inferred as the MTIs of S3 after removing those containing miRNAs of S1 or mRNAs of S2.
Parameters of the Model
The kernel, cost, and gamma of SVM were set to radial, 1, and 1, respectively. Because the positive (86 normal samples) and negative samples (755 BRCA samples) were unbalanced, we used the random sub-sampling method to balance the data. We sampled the training set and the testing set 20 times. Each time, 40 positive samples and 40 negative samples were randomly chosen to form a training set. The corresponding test set is randomly selected from the remaining positive and negative samples, which guarantees that there is no overlap between the training and testing sets. The SVM classification accuracy (ACC) of the 20 groups of balanced data was obtained. We use the mean value of the 20 ACCs as the final accuracy. The formula for ACC from any testing data is defined as follows:
where TP (true positive) is the number of positive samples that are identified correctly, FN (false negative) is the number of positive samples that are identified incorrectly, TN (true negative) is the number of negative samples that are identified correctly, and FP (false positive) is the number of negative samples that are identified incorrectly.
Author Contributions
C.P. and X.F. conceived and designed the study; C.P., S.M., and G.Z. analyzed the data; J.D., S.Y.L., and F.L. contributed ideas and comments; C.P. and X.F. wrote the paper; and all authors read and approved the final manuscript.
Conflicts of Interest
The authors declare no competing interests.
Acknowledgments
This work was supported by Startup Foundation for Advanced Talents at Nanjing Agricultural University and Hong Kong Scholars Program (grants. 050/804009 and 2017-037) and three grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Theme-based Research Scheme T12-710/16-R; General Research Funds 14203915 and 14173817).
Footnotes
Supplemental Information can be found online at https://doi.org/10.1016/j.omtn.2020.01.019.
Supplemental Information
Sheet1 in Table S2 records the predictive accuracy of 155,044 MTIs, 1,046 miRNAs and 20,530 mRNA based on the expression data of BRCA. Sheet2 represents 136 MTIs with ACC>0.9 in the miRNA-mRNA join space while the corresponding marginal ACC of both the miRNA are the mRNA are less than 0.8 in BRCA.
Sheet1 in Table S3 records the predictive accuracy of 155,044 MTIs, 1,046 miRNAs and 20,530 mRNA based on the expression data of KIRC. Sheet2 represents 76 MTIs with ACC>0.9 in the miRNA-mRNA join space while the corresponding marginal ACC of both the miRNA are the mRNA are less than 0.8 in KIRC
References
- 1.Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. [DOI] [PubMed] [Google Scholar]
- 2.Bartel D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/s0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]
- 3.Bartel D.P. Metazoan MicroRNAs. Cell. 2018;173:20–51. doi: 10.1016/j.cell.2018.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bartel D.P. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–233. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lim L.P., Lau N.C., Weinstein E.G., Abdelhakim A., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P. The microRNAs of Caenorhabditis elegans. Genes Dev. 2003;17:991–1008. doi: 10.1101/gad.1074403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lagos-Quintana M., Rauhut R., Yalcin A., Meyer J., Lendeckel W., Tuschl T. Identification of tissue-specific microRNAs from mouse. Curr. Biol. 2002;12:735–739. doi: 10.1016/s0960-9822(02)00809-6. [DOI] [PubMed] [Google Scholar]
- 7.Lewis B.P., Burge C.B., Bartel D.P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
- 8.Friedman R.C., Farh K.K., Burge C.B., Bartel D.P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19:92–105. doi: 10.1101/gr.082701.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fromm B., Billipp T., Peck L.E., Johansen M., Tarver J.E., King B.L., Newcomb J.M., Sempere L.F., Flatmark K., Hovig E., Peterson K.J. A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome. Annu. Rev. Genet. 2015;49:213–242. doi: 10.1146/annurev-genet-120213-092023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hwang H.W., Mendell J.T. MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br. J. Cancer. 2006;94:776–780. doi: 10.1038/sj.bjc.6603023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cui Q., Yu Z., Purisima E.O., Wang E. Principles of microRNA regulation of a human cellular signaling network. Mol. Syst. Biol. 2006;2:46. doi: 10.1038/msb4100089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hirota T., Date Y., Nishibatake Y., Takane H., Fukuoka Y., Taniguchi Y., Burioka N., Shimizu E., Nakamura H., Otsubo K., Ieiri I. Dihydropyrimidine dehydrogenase (DPD) expression is negatively regulated by certain microRNAs in human lung tissues. Lung Cancer. 2012;77:16–23. doi: 10.1016/j.lungcan.2011.12.018. [DOI] [PubMed] [Google Scholar]
- 13.Tavazoie S.F., Alarcón C., Oskarsson T., Padua D., Wang Q., Bos P.D., Gerald W.L., Massagué J. Endogenous human microRNAs that suppress breast cancer metastasis. Nature. 2008;451:147–152. doi: 10.1038/nature06487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Akao Y., Nakagawa Y., Naoe T. let-7 microRNA functions as a potential growth suppressor in human colon cancer cells. Biol. Pharm. Bull. 2006;29:903–906. doi: 10.1248/bpb.29.903. [DOI] [PubMed] [Google Scholar]
- 15.Thum T., Galuppo P., Wolf C., Fiedler J., Kneitz S., van Laake L.W., Doevendans P.A., Mummery C.L., Borlak J., Haverich A. MicroRNAs in the human heart: a clue to fetal gene reprogramming in heart failure. Circulation. 2007;116:258–267. doi: 10.1161/CIRCULATIONAHA.107.687947. [DOI] [PubMed] [Google Scholar]
- 16.Wang W., Kwon E.J., Tsai L.H. MicroRNAs in learning, memory, and neurological diseases. Learn. Mem. 2012;19:359–368. doi: 10.1101/lm.026492.112. [DOI] [PubMed] [Google Scholar]
- 17.Takeshita N., Hoshino I., Mori M., Akutsu Y., Hanari N., Yoneyama Y., Ikeda N., Isozaki Y., Maruyama T., Akanuma N. Serum microRNA expression profile: miR-1246 as a novel diagnostic and prognostic biomarker for oesophageal squamous cell carcinoma. Br. J. Cancer. 2013;108:644–652. doi: 10.1038/bjc.2013.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Salzman D.W., Weidhaas J.B. SNPing cancer in the bud: microRNA and microRNA-target site polymorphisms as diagnostic and prognostic biomarkers in cancer. Pharmacol. Ther. 2013;137:55–63. doi: 10.1016/j.pharmthera.2012.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhou X., Xu X., Wang J., Lin J., Chen W. Identifying miRNA/mRNA negative regulation pairs in colorectal cancer. Sci. Rep. 2015;5:12995. doi: 10.1038/srep12995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liao X., Zhu G., Huang R., Yang C., Wang X., Huang K., Yu T., Han C., Su H., Peng T. Identification of potential prognostic microRNA biomarkers for predicting survival in patients with hepatocellular carcinoma. Cancer Manag. Res. 2018;10:787–803. doi: 10.2147/CMAR.S161334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Le T.D., Liu L., Tsykin A., Goodall G.J., Liu B., Sun B.Y., Li J. Inferring microRNA-mRNA causal regulatory relationships from expression data. Bioinformatics. 2013;29:765–771. doi: 10.1093/bioinformatics/btt048. [DOI] [PubMed] [Google Scholar]
- 22.Pian C., Zhang G., Wu S., Li F. Discovering the ‘Dark matters’ in expression data of miRNA based on the miRNA-mRNA and miRNA-lncRNA networks. BMC Bioinformatics. 2018;19:379. doi: 10.1186/s12859-018-2410-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Huang W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 24.Huang W., Sherman B.T., Lempicki R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chou C.H., Chang N.W., Shrestha S., Hsu S.D., Lin Y.L., Lee W.H., Yang C.D., Hong H.C., Wei T.Y., Tu S.J. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44(D1):D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li Y., Qiu C., Tu J., Geng B., Yang J., Jiang T., Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070–D1074. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Sheet1 in Table S2 records the predictive accuracy of 155,044 MTIs, 1,046 miRNAs and 20,530 mRNA based on the expression data of BRCA. Sheet2 represents 136 MTIs with ACC>0.9 in the miRNA-mRNA join space while the corresponding marginal ACC of both the miRNA are the mRNA are less than 0.8 in BRCA.
Sheet1 in Table S3 records the predictive accuracy of 155,044 MTIs, 1,046 miRNAs and 20,530 mRNA based on the expression data of KIRC. Sheet2 represents 76 MTIs with ACC>0.9 in the miRNA-mRNA join space while the corresponding marginal ACC of both the miRNA are the mRNA are less than 0.8 in KIRC