Skip to main content
Cancer Informatics logoLink to Cancer Informatics
. 2017 Jun 23;16:1176935117716405. doi: 10.1177/1176935117716405

An Assessment of Database-Validated microRNA Target Genes in Normal Colonic Mucosa: Implications for Pathway Analysis

Martha L Slattery 1,, Jennifer S Herrick 1, John R Stevens 2, Roger K Wolff 1, Lila E Mullany 1
PMCID: PMC5484592  PMID: 28690395

Abstract

BACKGROUND

Determination of functional pathways regulated by microRNAs (miRNAs), while an essential step in developing therapeutics, is challenging. Some miRNAs have been studied extensively; others have limited information. In this study, we focus on 254 miRNAs previously identified as being associated with colorectal cancer and their database-identified validated target genes.

METHODS

We use RNA-Seq data to evaluate messenger RNA (mRNA) expression for 157 subjects who also had miRNA expression data. In the replication phase of the study, we replicated associations between 254 miRNAs associated with colorectal cancer and mRNA expression of database-identified target genes in normal colonic mucosa. In the discovery phase of the study, we evaluated expression of 18 miR-NAs (those with 20 or fewer database-identified target genes along with miR-21-5p, miR-215-5p, and miR-124-3p which have more than 500 database-identified target genes) with expression of 17 434 mRNAs to identify new targets in colon tissue. Seed region matches between miRNA and newly identified targeted mRNA were used to help determine direct miRNA-mRNA associations.

RESULTS

From the replication of the 121 miRNAs that had at least 1 database-identified target gene using mRNA expression methods, 97.9% were expressed in normal colonic mucosa. Of the 8622 target miRNA-mRNA associations identified in the database, 2658 (30.2%) were associated with gene expression in normal colonic mucosa after adjusting for multiple comparisons. Of the 133 miRNAs with database-identified target genes by non-mRNA expression methods, 97.2% were expressed in normal colonic mucosa. After adjustment for multiple comparisons, 2416 miRNA-mRNA associations remained significant (19.8%). Results from the discovery phase based on detailed examination of 18 miRNAs identified more than 80 000 miRNA-mRNA associations that had not previously linked to the miRNA. Of these miRNA-mRNA associations, 15.6% and 14.8% had seed matches for CRCh38 and CRCh37, respectively.

CONCLUSIONS

Our data suggest that miRNA target gene databases are incomplete; pathways derived from these databases have similar deficiencies. Although we know a lot about several miRNAs, little is known about other miRNAs in terms of their targeted genes. We encourage others to use their data to continue to further identify and validate miRNA-targeted genes.

Keywords: Colon cancer, miRNA, functionality, bias, RNA-Seq

Background

MicroRNAs (miRNAs) regulate gene expression by either repressing translation of messenger RNAs (mRNAs) or by inducing mRNA degradation by complementarity binding of target sequences.1 Given their role in gene expression, it is not surprising that miRNAs have been shown to be associated with many physiological processes and have been linked to numerous diseases.210 As such, miRNAs have the potential of being valuable biomarkers for both early disease detection and prognosis as well as serving as targets for therapeutic purposes. An understanding of the functionality of miRNAs and how they relate to various biological pathways is essential to move the field of miRNA research from associations with diseases and conditions to therapeutic and interventional tools.

Determination of functional pathways regulated by miRNAs, while an essential step in developing therapeutics centering on miRNA expression, is challenging. Functional pathway analysis is largely dependent on databases such as miRTarBase which have identified miRNA target genes that have been validated by a variety of methods.11 MicroRNA target interactions (MTIs) incorporated into the database have varying degrees of supporting evidence. The minority of MTIs incorporated into the database have strong evidence and include reporter assays and Western blot methods. Most of the MTIs in the database come from what is considered less stringent methods of target gene identification such as microarray and next-generation sequencing methods, one of which is cross-linking and immunoprecipitation sequencing or cross-linking immunoprecipitation (CLIP)-Seq. Although quantitative polymerase chain reaction (qPCR), microarray, RNA-Seq, and Northern blot methods detect MTI associations through gene expression, other experiments such as reporter assay and Western blot studies measure expression levels of proteins. Comparisons of miRNA expression and mRNA expression patterns have been used as methods to identify target genes.1113

Although the study of miRNAs as they relate to disease is a rapidly expanding field of research, databases that contain information on miRNA-validated targeted genes are dependent on what is known in the literature.

It is not surprising that some miRNAs have been studied extensively, whereas others have limited information. The extent to which information bias regarding miRNA-targeted genes influences our ability to accurately infer functional pathways associated with miRNAs is unexplored. In this study, we examine 254 miRNAs that we have previously identified as being associated with colorectal cancer, colorectal cancer survival, or the microsatellite instability (MSI) tumor phenotype2,3,14,15 with mRNA expression. We compare expression levels between mRNA for previously identified target genes and the miRNA expression level. We analyze separately associations for those target genes previously identified by mRNA expression methods versus those identified by other methods. For a smaller subset of miRNAs with 20 or fewer reported target genes and for 3 miRNAs with hundreds of database-identified target genes, we compare miRNA expression level with all mRNA expression of protein-coding genes in colon tissue. We further compare seed region for those newly discovered mRNAs associated with miRNAs to help identify target genes that may be directly influenced by miRNAs. Figure 1 depicts the study flow.

Figure 1.

Figure 1

Study flow.

Methods

Study participants were part of a colon cancer case-control study that included incident first primary adenocarcinoma of the colon who were diagnosed between 30 and 79 years of age and resided in Utah or were members of the Kaiser Permanente Medical Care Program (KPMCP) in Northern California. Participants were non-Hispanic white, Hispanic, or African American.16 Local Surveillance, Epidemiology, and End Results tumor registries verified all cases that were diagnosed between October 1991 and September 1994. Detailed study methods have been described.3 Participants signed informed consent prior to release of confidential data. The Institutional Review Boards of the University of Utah and the KPMCP approved the study.

RNA Processing

Formalin-fixed paraffin-embedded tissue was used to extract RNA. Normal mucosae adjacent to the carcinoma tissue and matched carcinoma tissue were used to make RNA. Total RNA was extracted, isolated, and purified using the RecoverAll Total Nucleic Acid Isolation Kit (Ambion, Austin, Texas); RNA yields were determined using a NanoDrop spectrophotometer.

MicroRNA

The Agilent Human miRNA Microarray V19.0 was used. The microarray contains probes for 2006 unique human miRNAs as described previously. Data were required to pass stringent quality control (QC) parameters established by Agilent that included tests for excessive background fluorescence, excessive variation among probe sequence replicates on the array, and measures of the total gene signal on the array to assess low signal. If samples failed to meet quality standards for any of these parameters, the sample was relabeled, hybridized to arrays, and rescanned. If a sample failed QC assessment a second time, the sample was deemed to be of poor quality and was excluded from analysis. Our previous analysis has shown that the repeatability associated with this microarray was extremely high (r = 0.98),3 and that comparison of miRNA expression levels obtained from the Agilent microarray with those obtained from qPCR had an agreement of 100% in terms of directionality of findings and that the fold change calculated for the miRNA expression difference between carcinoma and normal colonic mucosa was almost identical.2 Of the 2006 unique human miRNAs assessed on the Agilent microarray, 1226 were expressed in colon carcinoma tissue and 1179 in normal colon mucosa.

To normalize differences in miRNA expression that could be attributed to the array, amount of RNA, location on array, or factors that could erroneously influence miRNA expression levels, total gene signal was normalized by multiplying each sample by a scaling factor,17 which was the median of the 75th percentiles of all the samples divided by the individual 75th percentile of each sample.

RNA-Seq Sequencing Library Preparation and Data Processing

Total RNA was run on 197 carcinoma and normal mucosa pairs; 157 of these passed QC. These samples were taken from the study subjects used for miRNA analysis and were extracted, isolated, and purified in the same manner as previously described.18 RNA library construction was done with the Illumina TruSeq Stranded Total RNA Sample Preparation Kit with Ribo-Zero. The samples were then fragmented and primed for complementary DNA (cDNA) synthesis, adapters were then ligated onto the cDNA, and the resulting samples were then amplified using polymerase chain reaction (PCR); the amplified library was then purified using Agencourt AMPure XP beads. A more detailed description of the methods can be found in our previous work.19

Illumina TruSeq v3 single read flow cell and a 50-cycle single-read sequence run were performed on an Illumina HiSeq instrument. Reads were aligned to a sequence database containing the human genome (build GRCh37/hg19, February 2009 from genome.ucsc.edu). Python and a pysam module were used to calculate counts for each exon and untranslated region (UTR) of the genes using a list of gene coordinates obtained from http://genome.ucsc.edu. Total gene counts were determined. We dropped features that were not expressed in our data or for which the expression was missing for most of the samples.19

Bioinformatics Analysis

Our assessment of miRNAs focused on 254 miRNAs that we have previously reported as being associated with either differences in tumor and normal mucosa expression with a fold change of at least 1.5, survival, or with MSI tumor phenotype.2,3,14 We identified experimentally validated target gene these miRNAs using miRNA-mRNA pairs from miRTarBase v6.0.11 As our miRNA names are those from Agilent v19 (corresponding to miRBase v19) and miRTar-Base v6.0 includes newer associations, we determined the new nomenclature for any miRNAs whose names did not match using archived miRBase (http://www.mirbase.org)20 “miRNA.diff.zip” files.

To determine associations between miRNA-mRNA pairs in colon tissue and to help estimate the extent of completeness of existing databases, we conducted both a replication and discovery analysis.

For the replication component of the study, we determined how many and which target genes were identified using gene expression methods, namely, “microarray,” RNA-Seq, “qRT-PCR,” and “Northern blot” experiments (we refer to gene expression methods as “mRNA-methods”) in miRTarBase and whether the miRNA-mRNA pairs could be replicated in normal colon tissue. Target genes for miRNAs identified by methods other than gene expression (which we refer to as “non-mRNA methods”) were analyzed separately to determine how many of these could be identified by RNA-Seq, an mRNA expression method. Our hypothesis is that those target genes identified by mRNA methods should correlate more positively to our comparison of miRNA with target genes using RNA-Seq data.

For the discovery component of the study, we focused on miRNAs that had 20 or fewer validated targets in miRTarBase as well as 3 commonly validated miRNAs (hsa-miR-21-5p, hsa-miR-124-3p, and hsa-miR-215). We analyzed these miR-NAs with mRNA expression generated by RNA-Seq in our data set excluding those validated target genes that had previously been identified in miRTarBase (see section “Statistical Methods”). We further analyzed miRNAs and targeted mRNAs for seed region matches, and we analyzed the mRNA 3′ UTR FASTA as well as the seed region sequence of the associated miRNA to determine seed region pairings between miRNA and mRNA. MicroRNA seed regions were calculated as described in our previous work,21 and we calculated and included seeds of 6, 7, and 8 nucleotides in length. Our hypothesis is that a seed match would increase the likelihood that identified genes associated with a specific miRNA were more likely to have a direct association given a higher propensity for binding. As miRTarBase uses findings from many different investigations spanning across years and alignments, we used FASTA sequences generated from both GRCh37 and GRCh38 Homo sapiens alignments, using UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables).22 We downloaded FASTA sequences that matched our Ensembl IDs and had a consensus coding sequences available. Analysis was done using scripts in R 3.2.3 and in perl 5.018002.

Statistical Methods

Our final analysis consisted of 157 subjects with high-quality mRNA expression data and high-quality miRNA data. After excluding from the analysis any non–protein-coding mRNAs and those with fewer than 0.5 reads on average across all samples, we were able to examine 17 434 mRNAs that had unique Ensembl IDs. We used the log base 2–transformed RPKM (reads per kilobase per million) normal colonic mucosa mRNA expression data.

We examined the association between the mRNAs and the candidate miRNAs by fitting a linear model and adjusting for age at diagnosis, study center, and sex. P values were generated using the bootstrap method by creating a distribution of 10 000 F statistics derived by resampling the residuals from the null hypothesis model of no association between the mRNAs and miRNAs using the boot package in R. Associations were considered significant if the false discovery rate (FDR)– adjusted P values were less than .05.23

Results

Most of the study population was men, and the average age of study participants was 65.1 years (Table 1). The major molecular tumor phenotypes for the colon cancer study participants were 18.5% MSI, 42.7% with a mutated TP53 gene, 28% with a mutation in KRAS gene, and 27.4% with a CpG island methylator phenotype high tumor. The distribution of targeted genes per miRNA shows that 117 of the 254 miRNAs we were able to evaluate had fewer than 100 target genes, 60 miRNAs had 100 to 200 identified target genes, 51 miRNAs had between 200 and 500 targeted genes, and 26 miRNAs had more than 500 database-identified target genes (Figure 2).

Table 1.

Description of the study population.

NO. (%)
Age Mean (SD) 65.1 (10.2)
Sex Male 88 (56.1)
Female 69 (44.0)
AJCC stage 1 36 (23.1)
2 51 (32.7)
3 50 (32.1)
4 19 (12.2)
Tumor instability MSS 128 (81.5)
MSI 29 (18.5)
TP53 Nonmutated 90 (57.3)
Mutated 67 (42.7)
KRAS Nonmutated 113 (72.0)
Mutated 44 (28.0)
CIMP Low 114 (72.6)
High 43 (27.4)

Abbreviations: AJCC, American Joint Committee on Cancer; CIMP, CpG island methylator phenotype; MSI, microsatellite instability; MSS, microsatellite stable.

Figure 2.

Figure 2

Distribution of database-validated target genes by individual microRNAs (miRNAs).

Replication phase

Of the 121 miRNAs that had at least 1 targeted gene identified by mRNA gene expression methods, 97.9% were expressed in normal colonic mucosa (Table 2 shows 50 of the 121 miR-NAs; Supplemental Table 1 shows all 121 miRNAs). Of the 8622 target miRNA-mRNA associations identified in the database, 2658 (30.2%) were associated with gene expression in normal colonic mucosa after adjusting for multiple comparisons; prior to adjustment for multiple comparisons, 42.9% of database-identified associations were detected. For these miRNAs that had at least 1 targeted gene identified by a gene expression method, 31 399 targeted genes were database validated by non-mRNA methods; of these, 98.3% were expressed in normal colonic mucosa. Of the database-identified target genes identified by non-mRNA expression methods, 48.0% were associated with the targeted gene in our data prior to adjustment for multiple comparisons and 37.6% were significant when an FDR of 0.05 was applied.

Table 2.

Comparison of miRNA-validated targeted genes with mRNA expression to colon RNA-Seq data.

MIRNA MRNA DATABASE–IDENTIFIED TARGET GENES NON-MRNA DATABASE–IDENTIFIED TARGET GENES
TOTAL, N EXPRESSED IN COLON TISSUE, N VALIDATED BY RNA-SEQ PUNADJ < .05 VALIDATED BY RNA-SEQ FDR < 0.05, N TOTAL, N EXPRESSED IN COLON TISSUE, N VALIDATED BY RNA-SEQ PUNADJ < .05, N VALIDATED BY RNA-SEQ FDR < 0.05, N
Total targets 8808 8622 3782 2658 31 399 30 855 15 080 11 795
hsa-let-7a-5p 37 36 17 10 528 522 341 257
hsa-let-7e-5p 12 12 4 1 519 511 165 87
hsa-let-7f-5p 20 20 11 9 325 318 208 173
hsa-let-7g-5p 17 17 11 8 274 269 198 163
hsa-let-7i-5p 5 4 3 3 278 273 242 226
hsa-miR-1-3p 419 411 38 9 424 421 28 7
hsa-miR-10a-5p 11 11 6 5 408 400 229 179
hsa-miR-15a-5p 79 78 37 14 573 566 179 56
hsa-miR-16-5p 108 108 96 89 1339 1320 1159 1077
hsa-miR-17-5p 40 40 27 22 1050 1034 772 661
hsa-miR-19b-3p 12 12 3 1 658 650 96 19
hsa-miR-20a-5p 33 33 23 16 953 939 674 537
hsa-miR-20b-5p 11 11 6 5 838 824 617 488
hsa-miR-21-3p 3 3 3 2 62 61 38 35
hsa-miR-21-5p 438 434 365 319 120 118 93 79
hsa-miR-23a-3p 23 21 16 13 184 181 146 122
hsa-miR-23b-3p 13 12 5 3 282 278 155 102
hsa-miR-24-3p 259 255 217 194 509 492 415 397
hsa-miR-25-3p 16 16 12 12 401 394 337 312
hsa-miR-26a-5p 26 25 19 17 376 374 298 261
hsa-miR-26b-5p 1465 1383 267 91 267 266 61 22
hsa-miR-27a-3p 38 37 30 23 336 328 250 202
hsa-miR-27b-3p 25 24 11 6 345 336 185 135
hsa-miR-28-5p 2 2 0 0 69 68 56 50
hsa-miR-29a-3p 64 64 47 40 146 140 107 97
hsa-miR-29b-3p 68 68 24 12 150 144 68 50
hsa-miR-29c-3p 40 40 10 6 172 167 53 34
hsa-miR-30a-5p 15 15 1 1 658 653 55 15
hsa-miR-30b-5p 16 16 5 1 368 366 114 53
hsa-miR-30c-5p 25 24 4 3 454 451 23 11
hsa-miR-30d-5p 5 5 3 2 337 335 275 243
hsa-miR-30e-5p 7 7 0 0 314 313 32 8
hsa-miR-31-5p 37 36 1 0 132 127 2 0
hsa-miR-34a-5p 67 63 50 43 511 491 452 425
hsa-miR-92a-3p 34 34 31 30 1222 1205 1144 1124
hsa-miR-92b-3p 2 2 0 0 603 593 65 38
hsa-miR-93-5p 28 28 13 12 1114 1099 868 753
hsa-miR-98-5p 470 461 225 111 235 230 103 56
hsa-miR-99a-5p 10 10 3 1 115 114 40 19
hsa-miR-99b-5p 3 3 2 1 42 42 17 7
hsa-miR-100-5p 23 23 8 8 209 207 115 65
hsa-miR-101-3p 23 23 0 0 300 294 1 0
hsa-miR-106b-5p 79 75 14 1 934 920 205 58
hsa-miR-124-3p 1232 1209 1058 918 123 116 105 93
hsa-miR-125b-5p 70 70 50 38 305 303 250 218
hsa-miR-127-3p 7 7 3 2 13 13 5 4
hsa-miR-129-5p 12 12 11 9 353 344 291 267
hsa-miR-130a-3p 20 20 11 9 334 325 157 83
hsa-miR-130b-3p 9 9 0 0 507 497 5 1

Abbreviations: FDR, false discovery rate; mRNA, messenger RNA; miRNA, microRNA.

Evaluation of the 133 miRNAs with database-identified target genes only by non-mRNA expression methods showed that 11 850 of the 12 191 target genes were expressed in normal colonic mucosa (97.2%) (Table 3 shows 50 of the miRNAs; Supplemental Table 2 shows all 133 miRNAs). Of those expressed in normal colonic mucosa, 3770 (30.9%) miRNA-mRNA were associated in normal colonic mucosa using RNA-Seq data prior to adjustment for multiple comparisons. After adjustment for multiple comparisons, 2416 miRNA-mRNA associations remained significant (19.8%); this compares with 30.2% of miRNA:mRNA associations being detected when the original detection was a gene expression method.

Table 3.

Comparison of miRNA targets for 50 miRNAs with non-mRNA expression methods was used to RNA-Seq in colon tissue.

MIRNA TOTAL, N EXPRESSED IN COLON TISSUE, N VALIDATED BY RNA-SEQ PUNADJ < .05, N VALIDATED BY RNA-SEQ DR < 0.05, N
Total target genes 12 191 11 850 3770 2416
hsa-miR-30c-1-3p 421 410 20 6
hsa-miR-139-3p 70 66 1 0
hsa-miR-145-3p 45 44 5 4
hsa-miR-151a-3p 73 71 46 40
hsa-miR-151b 24 24 6 2
hsa-miR-192-3p 121 117 13 5
hsa-miR-204-3p 94 90 1 0
hsa-miR-324-5p 276 272 121 45
hsa-miR-361-3p 121 118 81 73
hsa-miR-378b 27 26 0 0
hsa-miR-378d 27 26 1 0
hsa-miR-378g 57 57 3 1
hsa-miR-378i 27 26 1 0
hsa-miR-425-3p 38 36 22 16
hsa-miR-455-3p 426 416 6 2
hsa-miR-466 157 155 17 9
hsa-miR-501-3p 62 58 4 3
hsa-miR-513c-3p 172 169 124 80
hsa-miR-532-3p 200 196 12 4
hsa-miR-550a-3-5p 81 77 12 4
hsa-miR-550b-2-5p 86 82 0 0
hsa-miR-583 133 129 16 8
hsa-miR-623 212 201 1 0
hsa-miR-652-3p 130 129 95 61
hsa-miR-654-5p 89 84 74 73
hsa-miR-659-5p 23 23 17 13
hsa-miR-662 16 16 0 0
hsa-miR-664a-3p 84 79 0 0
hsa-miR-664a-5p 108 105 13 6
hsa-miR-664b-3p 138 135 99 70
hsa-miR-664b-5p 13 13 8 6
hsa-miR-769-3p 145 140 129 119
hsa-miR-877-5p 210 207 158 118
hsa-miR-892b 54 54 7 2
hsa-miR-934 17 17 1 1
hsa-miR-939-5p 125 120 27 6
hsa-miR-1183 77 75 1 0
hsa-miR-1203 24 23 3 0
hsa-miR-1207-3p 67 63 25 10
hsa-miR-1225-5p 32 30 24 14
hsa-miR-1228-5p 26 26 9 0
hsa-miR-1229-5p 42 40 18 7
hsa-miR-1233-5p 123 122 4 2
hsa-miR-1271-5p 67 66 4 0
hsa-miR-1288-3p 25 25 10 6
hsa-miR-1305 176 173 84 57
hsa-miR-1913 112 110 0 0
hsa-miR-1973 5 5 4 3
hsa-miR-2117 50 48 20 5
hsa-miR-2392 106 104 19 5

Abbreviations: FDR, false discovery rate; mRNA, messenger RNA; miRNA, microRNA.

Discovery phase

Examination of those miRNAs that had fewer than 20 database-identified target genes along with miR-21-5p, miR-215-5p, and miR-124-3p which all have more than 500 database-identified target genes with all genes expressed in normal colonic mucosa gave us an indication as to how the databases compared at both ends of the mRNA target gene spectrum. We showed a large number of miRNA-mRNA associations for that had not previously linked to the miRNA (Table 4). Of the more than 80 000 miRNA-mRNA associations we detected using RNA-Seq data, 15.6% and 14.8% had seed matches for CRCh38 and CRCh37, respectively, supporting the hypothesis that seed matches would increase the likelihood of a direct association given the higher propensity for binding.

Table 4.

miRNA-mRNA associations in colon tissue using RNA-Seq data and matching seed region.

MIRNA DATABASE-VALIDATED TARGET GENES, N VALIDATED TARGET GENES EXPRESSING IN COLON TISSUE, N DATA BASE-VALIDATED TARGET GENES BY MRNA EXPRESSION METHODS, N MIRNA-MRNA ASSOCIATIONS IDENTIFIED IN COLON TISSUE (PUNADJ < .05), N MIRNA-MRNA ASSOCIATIONS IDENTIFIED IN COLON TISSUE (FDR < 0.05), N NEW MIRNA-MRNA ASSOCIATIONS IDENTIFIED (FDR < 0.01) USING CRCH 38 TO IDENTIFY SEED MATCH, N NEW MIRNA-MRNA ASSOCIATIONS IDENTIFIED (FDR < 0.01) USING CRCH37 TO IDENTIFY SEED MATCH
hsa-miR-3677-3p 4 4 0 13 363 13 059 1039 966
hsa-miR-6068 4 4 0 9805 6834 486 450
hsa-miR-1973 5 5 0 6918 1891 231 216
hsa-miR-3181 5 4 0 11 983 11 268 765 688
hsa-miR-4315 14 14 0 728 0 N/A N/A
hsa-miR-664b-5p 13 13 0 11 803 10 472 1876 1757
hsa-miR-4730 15 15 0 8796 4884 564 527
hsa-miR-3621 16 16 0 10 100 7869 657 589
hsa-miR-662 16 16 0 461 0 N/A N/A
hsa-miR-6717-5p 15 15 0 5982 1300 58 56
hsa-miR-4681 17 17 0 123 0 N/A N/A
hsa-miR-572 17 17 1 6178 0 N/A N/A
hsa-miR-934 17 17 0 785 0 N/A N/A
hsa-miR-127-3p 20 20 7 6908 844 41 36
hsa-miR-4787-5p 20 20 0 10 883 9616 1142 1068
hsa-miR-21-5p 558 552 434 8715 6646 1851 1754
hsa-miR-215-5p 713 707 670 2664 0 N/A N/A
hsa-miR-124-3p 1355 1325 1209 13 370 12 860 4961 4815

Abbreviations: FDR, false discovery rate; mRNA, messenger RNA; miRNA, microRNA.

Discussion

Evaluation of 254 miRNAs previously associated with colon cancer, MSI tumor phenotype, or with survival after diagnosis with colorectal cancer showed a great deal of variability in the number of targeted genes for each miRNA in miRTarBase. A major finding is the documentation of the incompleteness of the target genes for many miRNAs. Although our study was limited to colon cancer and could be considered a partial database with respect to miRTarBase, we identified numerous genes whose expression was associated with miRNA expression in colon tissue that were not identified in miRTarBase by similar gene expression methods. We believe that this variability is indicative of the extent to which miRNAs have been studied than actual differences in the number of targeted genes by specific miRNAs. However, the implications for determining functionality and pathways associated with genes targeted by miRNAs resulting from this discrepancy are many; pathways are driven by those miRNAs which have been researched the most. Second, although our partial database is restricted to only colon tissue and gene expression, it points out other limitations. First, our inability to identify target genes incorporated in the database from similar gene expression studies suggests that tissue specificity is important in determine disease-specific pathways. Pathways that encumber non–site-specific genes could obviously be irrelevant for the disease and tissue of interest. In short, although miRNA-mRNA associations may exist in some tissues, they may not be relevant in other tissue types. Our data suggest that 2% to 3% of targeted genes are not expressed in colon tissue and of those miRNA-mRNA associations reported in databases that are expressed in colon tissue, and less than 50% of targeted genes previously identified with mRNA validation methods could be validated with RNA-Seq data in our colon cancer samples. Lack of specificity can limit the accuracy of projected pathways generated by existing databases.

The variability in the number of targeted genes identified in the existing database with the 254 miRNAs that we evaluated is considerable (Figure 2). Although some miRNAs such as miR-21-5p had more than 500 previously identified targeted genes, 15 miRNAs had fewer than 20 targeted genes identified. The effect of this disparity is evident in current pathway tools. For instance, when hsa-miR-21-5p, which had 558 validated targets, is entered into miRPath24 (http://snf-515788.vm.okeanos.grnet.gr), an updated miRNA pathway tool, 34 pathways associated with an FDR correction applied and when using TarBase-validated target genes. When hsa-miR-3677-3p, which had 4 validated targets, is entered using the same parameters, 1 significant pathway is returned. Development of pathways associated with dysregulated miRNAs is thus heavily influenced by those miRNAs that have many genes previously identified and is only minimally represented by many miRNAs that have been examined in less detail and are shown to be associated with diseases. Pathways used to determine functionality of miRNAs or to identify therapeutic targets are incomplete and are dominated by a subset of the total miRNAs associated.

Tissue specificity of miRNA expression, while a concern, appears to have a less impact in terms of pathway identification because 97% to 98% of mRNAs were expressed in our targeted colon tissue. However, a greater concern is that although miR-NAs and mRNAs may be expressed in the tissue, they may not have the same regulatory impact. We saw that less than half of the mRNAs previously identified by gene expression methods were actually associated with their identified target in our data. This lack of reproducibility could be from tissue specificity. However, other reasons for the lack of association could also exist. Because we were looking at gene expression from RNA-Seq, we adjusted for more comparisons than perhaps some of the previously reported studies. It should be kept in mind that the microarray and next-generation sequencing methods are considered less reliable and our lack of confirmation of some of these associations could stem from the level of confidence in data such as these.

Validation of miRNA target genes includes methods that have varying degrees of their strength of evidence as well as what they are validating. Western blot and reporter assays that detect protein levels along with qPCR which measures mRNA expression levels have been considered stronger evidence than microarrays or next-generation sequencing techniques such as RNA-Seq or CLIP-Seq.11 Methods that can validate protein expression and are target-specific miRNAs to determine whether changing miRNA alters mRNA expression level or protein expression are ideal.2527 However, these techniques do not lend themselves to more widespread exploration of targeted genes. Microarrays and RNA-Seq, both methods that can more broadly curate gene expression, detect mRNA expression levels and do not evaluate protein expression. Because miRNAs function through posttranscriptional regulation to effect protein expression, these techniques could fail to identify associations with target genes that might exist. MicroRNAs can also affect mRNA expression through degradation by partial complementarity binding of target sequences. Target gene validation techniques, such as RNA-Seq, would detect variation in mRNA that could be correlated with miRNA expression. It has been stated that RNA-Seq provides “an alternative to microarray gene expression analysis allowing a deeper analysis to provide a larger list of inferring miRNA targets in comparable over-expression studies.”28

Although we have shown that some miRNAs are associated with thousands of mRNAs, we have attempted to identify mRNAs that are more directly associated with the miRNA of interest. It is most likely that many mRNAs are not directly associated with the miRNA of interest, but rather dysregulation is seen secondary to other directly targeted genes.27 To identify direction interactions, we used seed matches between the 5′ region of the mature miRNA from nucleotides 2 to 8, or the seed region, and the 3′ UTR of the mRNA.21 A seed region match suggests a more functional binding between the miRNA and the targeted gene. It also has been suggested by Thomson et al28 that seed matches can provide a mechanism of enriching for direct miRNA targets over indirect or secondary effects. Evaluation of seed region matches between significant miRNA-mRNA associations showed that there were a considerable number of previously unreported target genes (Table 4). This would suggest that many of the associations are indirectly related, but that the databases are incomplete for most miR-NAs when it comes to identifying targeted genes and subsequent pathways constructed from those genes.

Another important consideration when interpreting miRNA and their targeted pathways is the complexity of miR-NAs and their associations with gene expression. MicroRNAs regulate hundreds if not thousands of genes, and individual genes are regulated by many miRNAs. Furthermore, it is unlikely that miRNA-mRNA associations apply to all tissues. The lack of specificity in these associations adds complexity to constructing pathways because it is difficult to know which miRNA-mRNA–targeted pairs are most relevant for the condition being studied.

The study has several strengths. First is our depth of data that includes individuals with both miRNA and RNA-Seq data. This allows us to attempt colon cancer–specific functionality assessment. In addition, using RNA-Seq data to evaluate mRNA expression, we are able to consider many more genes expressed in colon tissue than other methods that are more labor-intensive and tissue-intensive. This allows us to undertake a broader discovery of new miRNA-mRNA associations. However, there are limitations in that associations were detected by non-mRNA methods that may imply translational miRNA mechanisms that would not be detected when looking at gene expression. Availability of protein data would enhance our analysis but is unavailable. We adjusted for multiple comparisons and through this adjustment could have missed previously identified associations. Although we have applied rigid QC and compared subsets of both our miRNA and mRNA data to qPCR with good results,2,29 there is potential for inaccuracies in these techniques.

There are several other considerations when constructing functional pathways associated with miRNAs. Our data suggest that miRNA-targeted gene databases are incomplete; pathways derived from these databases have similar deficiencies. Although we know a lot about several miRNAs, we know very little about other miRNAs in terms of what genes they target. It appears that for most miRNAs, the information is incomplete in terms of validated targeted genes and that tissue-specific associations exist.

Conclusions

Existing databases of miRNA-targeted genes have limitations both in terms of coverage for specific miRNAs and tissue-specific miRNA-mRNA associations. We encourage others to use their data to continue to further identify and validate miRNA-targeted genes to improve the likelihood that research conducted on miR-NAs will help translate to improve medical care.

Supplementary Materials

Acknowledgments

The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official view of the National Cancer Institute. The authors acknowledge Sandra Edwards for data oversight and Dr Bette Caan and the staff at the KPMCP for their contribution to data collection, Dr Wade Samowitz for pathologic slide review and Erica Wolff and Michael Hoffman for miRNA analysis, and Brett Milash and the Bioinformatics Shared Resource of the Huntsman Cancer Institute and University of Utah for miRNA and mRNA bioinformatics data processing.

Footnotes

PEER REVIEW: Three peer reviewers contributed to the peer review report. Reviewers’ reports totaled 885 words, excluding any confidential comments to the academic editor.

FUNDING: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by NCI grants CA163683 and CA48998 from the National Cancer Institute at the National Institutes of Health.

DECLARATION OF CONFLICTING INTERESTS: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Contributions

MLS obtained funding, conducted research, and wrote the manuscript. LEM assisted in writing manuscript and conducted bioinformatics analysis. JSH conducted statistical analysis. JRS provided input into statistical analysis. RKW oversaw laboratory conducting miRNA and RNA-Seq analysis.

Availability of Data and Material

Data are made available in accordance with signed consent forms. Please contact author for data requests.

Ethical Approval and Consent to Participate

Participants signed informed consent prior to release of confidential data. The Institutional Review Boards of the University of Utah and the KPMCP approved the study.

REFERENCES

  • 1.Engels BM, Hutvagner G. Principles and effects of microRNA-mediated post-transcriptional gene regulation. Oncogene. 2006;25:6163–6169. doi: 10.1038/sj.onc.1209909. [DOI] [PubMed] [Google Scholar]
  • 2.Pellatt DF, Stevens JR, Wolff RK, et al. Expression profiles of miRNA subsets distinguish human colorectal carcinoma and normal colonic mucosa. Clin Transl Gastroenterol. 2016;7:e152. doi: 10.1038/ctg.2016.11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Slattery ML, Herrick JS, Pellatt DF, et al. MicroRNA profiles in colorectal carcinomas, adenomas and normal colonic mucosa: variations in miRNA expression and disease progression. Carcinogenesis. 2016;37:245–261. doi: 10.1093/carcin/bgv249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu B, Ding JF, Luo J, Lu L, Yang F, Tan XD. Seven protective miRNA signatures for prognosis of cervical cancer. Oncotarget. 2016;7:56690–56698. doi: 10.18632/oncotarget.10678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zeng YB, Liang XH, Zhang GX, et al. miRNA-135a promotes hepatocellular carcinoma cell migration and invasion by targeting forkhead box O1. Cancer Cell Int. 2016;16:63. doi: 10.1186/s12935-016-0328-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bandopadhyay M, Sarkar N, Datta S, et al. Hepatitis B virus X protein mediated suppression of miRNA-122 expression enhances hepatoblastoma cell proliferation through cyclin G1-p53 axis. Infect Agent Cancer. 2016;11:40. doi: 10.1186/s13027-016-0085-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chakraborty C, Chin KY, Das S. miRNA-regulated cancer stem cells: understanding the property and the role of miRNA in carcinogenesis. Tumour Biol. 2016;37:13039–13048. doi: 10.1007/s13277-016-5156-1. [DOI] [PubMed] [Google Scholar]
  • 8.Lee S, Lim S, Ham O, et al. ROS-mediated bidirectional regulation of miRNA results in distinct pathologic heart conditions. Biochem Biophys Res Commun. 2015;465:349–355. doi: 10.1016/j.bbrc.2015.07.160. [DOI] [PubMed] [Google Scholar]
  • 9.Yue J. miRNA and vascular cell movement. Adv Drug Deliv Rev. 2011;63:616–622. doi: 10.1016/j.addr.2011.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Li T, Yang GM, Zhu Y, et al. Diabetes and hyperlipidemia induce dysfunction of VSMCs: contribution of the metabolic inflammation/miRNA pathway. Am J Physiol Endocrinol Metab. 2015;308:E257–E269. doi: 10.1152/ajpendo.00348.2014. [DOI] [PubMed] [Google Scholar]
  • 11.Chou CH, Chang NW, Shrestha S, et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44:D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lupini L, Bassi C, Ferracin M, et al. miR-221 affects multiple cancer pathways by modulating the level of hundreds messenger RNAs. Front Genet. 2013;4:64. doi: 10.3389/fgene.2013.00064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wang YP, Li KB. Correlation of expression profiles between microRNAs and mRNA targets using NCI-60 data. BMC Genomics. 2009;10:218. doi: 10.1186/1471-2164-10-218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Slattery ML, Herrick JS, Pellatt DF, et al. Site-specific associations between miRNA expression and survival in colorectal cancer cases. Oncotarget. 2016;7:60193–60205. doi: 10.18632/oncotarget.11173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Slattery ML, Herrick JS, Mullany LE, et al. Colorectal tumor molecular phenotype and miRNA: expression profiles and prognosis. Mod Pathol. 2016;29:915–927. doi: 10.1038/modpathol.2016.73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Slattery ML, Potter J, Caan B, et al. Energy balance and colon cancer—beyond physical activity. Cancer Res. 1997;57:75–80. [PubMed] [Google Scholar]
  • 17.Agilent Technologies I. Agilent GeneSpring User Manual. 2013. [Accessed July 16, 2015]. http://genespring-support.com/files/gs_12_6/GeneSpring-manual.pdf.
  • 18.Slattery ML, Herrick JS, Mullany LE, et al. An evaluation and replication of miRNAs with disease stage and colorectal cancer-specific mortality. Int J Cancer. 2015;137:428–438. doi: 10.1002/ijc.29384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Slattery ML, Pellatt DF, Mullany LE, Wolff RK, Herrick JS. Gene expression in colon cancer: a focus on tumor site and molecular phenotype. Genes Chromosomes Cancer. 2015;54:527–541. doi: 10.1002/gcc.22265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kozomara AG-JS. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42:D68–D73. doi: 10.1093/nar/gkt1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mullany LE, Herrick JS, Wolff RK, Slattery ML. MicroRNA seed region length impact on target messenger RNA expression and survival in colorectal cancer. PLoS ONE. 2016;11:e0154177. doi: 10.1371/journal.pone.0154177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Karolchik D, Hinrichs AS, Furey TS, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–D496. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300. [Google Scholar]
  • 24.Vlachos IS, Zagganas K, Paraskevopoulou MD, et al. DIANA-miRPath v3.0: deciphering microRNA function with experimental support. Nucleic Acids Res. 2015;43:W460–W466. doi: 10.1093/nar/gkv403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kuhn DE, Martin MM, Feldman DS, Terry AV, Jr, Nuovo GJ, Elton TS. Experimental validation of miRNA targets. Methods. 2008;44:47–54. doi: 10.1016/j.ymeth.2007.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Martinez-Sanchez A, Murphy CL. MicroRNA target identification-experimental approaches. Biology (Basel) 2013;2:189–205. doi: 10.3390/biology2010189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Vasudevan S. Functional validation of microRNA-target RNA interactions. Methods. 2012;58:126–134. doi: 10.1016/j.ymeth.2012.08.002. [DOI] [PubMed] [Google Scholar]
  • 28.Thomson DW, Bracken CP, Goodall GJ. Experimental strategies for microRNA target identification. Nucleic Acids Res. 2011;39:6845–6853. doi: 10.1093/nar/gkr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Slattery ML, Pellatt DF, Mullany LE, Wolff RK. Differential gene expression in colon tissue associated with diet, lifestyle, and related oxidative stress. PLoS ONE. 2015;10:e0134406. doi: 10.1371/journal.pone.0134406. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Cancer Informatics are provided here courtesy of SAGE Publications

RESOURCES