Abstract
Background
With the continuous discovery of microRNA’s (miRNA) association with a wide range of biological and cellular processes, expression profile-based functional characterization of such post-transcriptional regulation is crucial for revealing its significance behind particular phenotypes. Profound advancement in bioinformatics has been made to enable in depth investigation of miRNA’s role in regulating cellular and molecular events, resulting in a huge quantity of software packages covering different aspects of miRNA functional analysis. Therefore, an all-in-one software solution is in demand for a comprehensive yet highly efficient workflow. Here we present RBiomirGS, an R package for a miRNA gene set (GS) analysis.
Methods
The package utilizes multiple databases for target mRNA mapping, estimates miRNA effect on the target mRNAs through miRNA expression profile and conducts a logistic regression-based GS enrichment. Additionally, human ortholog Entrez ID conversion functionality is included for target mRNAs.
Results
By incorporating all the core steps into one package, RBiomirGS eliminates the need for switching between different software packages. The modular structure of RBiomirGS enables various access points to the analysis, with which users can choose the most relevant functionalities for their workflow.
Conclusions
With RBiomirGS, users are able to assess the functional significance of the miRNA expression profile under the corresponding experimental condition by minimal input and intervention. Accordingly, RBiomirGS encompasses an all-in-one solution for miRNA GS analysis. RBiomirGS is available on GitHub (http://github.com/jzhangc/RBiomirGS). More information including instruction and examples can be found on website (http://kenstoreylab.com/?page_id=2865).
Keywords: Logistic regression, Pathway analysis, Transcriptome, Gene set enrichment, Molecular biology, Post-transcriptional regulation
Introduction
MicroRNA (or miRNA) is a ∼22 nucleotide long small RNA species and is mostly recognized as a negative gene expression regulator on a post-transcriptional level (He & Hannon, 2004). miRNAs have been proposed as biomarkers and/or therapeutic targets for medical disorders such as drug-induced liver injury and cancer (Mitchell et al., 2008; Wang et al., 2009). Additionally, the primary structure of many miRNAs exhibits high level of conservation across species (Zhang & Storey, 2013), enabling smooth transfer of knowledge between different model systems.
Gene expression gene set (GS) analysis associates expression profiles with the functional outcome under specific experimental conditions and phenotypes. miRNA and coding gene expression GS analyses share the same general goal: to identify the significantly affected biological events from a given expression profile. The commonly used GS databases include gene ontology (GO) term (Ashburner et al., 2000) and KEGG (Kanehisa & Goto, 2000). Several GS techniques have been developed to directly incorporate differential expression (DE) results, such as gene set enrichment analysis (GSEA) (Subramanian et al., 2005). Even though it has been reported that these methods hold a more thorough and complete GS evaluation for coding genes (Mootha et al., 2003; Subramanian et al., 2005), the popular methods for miRNA research still rely on pre-selecting differentially expressed targets. Briefly, the commonly used miRNA GS analysis procedure starts with obtaining the list of the differentially expressed miRNAs, followed by searching for their target mRNAs, and then comparing the mRNA list with the GS databases (Long et al., 2013; Chen et al., 2013). However, it has been demonstrated that such method and its variations tend to exhibit bias of various origins (Khatri, Sirota & Butte, 2012; Bleazard et al., 2015). Moreover, the information on directionality from these methods is either indirect or lacking. One strategy to tackle the issue is to directly integrate miRNA DE results and transfer the information to the target mRNAs as a quantifiable metric.
There are a variety of computational analysis tools covering various aspects of miRNA studies, ranging from miRNA prediction, miRNA:mRNA interaction prediction and functional annotation (Gomes et al., 2013; Akhtar et al., 2016). As a result, multiple standalone tools are typically required to complete a miRNA GS workflow, e.g., mRNA target mapping, GS database preparation, GS enrichment, and results visualization. Practically, researchers usually face the challenge of constructing a pipeline for each project with multiple software packages and web services, which present incoherent connections between steps. Therefore, it is beneficial to establish a bioinformatic solution that searches multiple databases for mRNA target mapping and enables seamless navigation between analysis steps with minimal user intervention. Moreover, it is also critical to provide users with multiple entry points to the pipeline so that it is possible to customize and integrate only the functionalities necessary to their specific workflow. Here we present the R package RBiomirGS, a comprehensive miRNA GS analysis framework capable of performing the following tasks: (i) thorough target mRNA mapping, (ii) calculation of miRNA regulatory effect for target mRNAs, (iii) GS enrichment, and (iv) data visualization.
Methods
As shown in Fig. 1: users provide the miRNA identity list and associated DE results, as well as GS database file. The RNA mapping module takes the miRNA list and searches multiple databases for miRNA:mRNA interactions, resulting in either a validated or predicted target mRNA list. Fold change (FC) and p value from the miRNA DE list are then used to calculate a miRNA expression score for each miRNA measured, from which a miRNA impact score for target mRNAs is generated. With the mRNA score and GS database file, GS enrichment is then conducted using logistic regression. The package was built using R version 3.4.0 (R Core Team, 2017).
Target mRNA mapping module
RBiomirGS features a target mRNA mapping module that utilizes multiple miRNA:mRNA interaction databases, whose information is hosted on a SQL server at University of Colorado Cancer Centre (http://multimir.ucdenver.edu/). Information for both predicted and validated miRNA:mRNA interactions can be retrieved from the server. Although a disease research-focused R interface was developed by the host institution for data query (Ru et al., 2014), we assembled our own module for a more general purpose miRNA:mRNA interaction search with additional code optimizations such as parallel computing. The current module takes advantage of multiple databases for a more complete mapping result. For the experimentally validated miRNA:mRNA interactions, miRecords, mirTarBase and TarBase were used (Xiao et al., 2009; Chou et al., 2016; Sethupathy, Corda & Hatzigeorgiou, 2006); whereas DIANA-microT-CDS, ElMMo, MicroCosm (http://www.ebi.ac.uk/enright-srv/microcosm/htdocs/targets/v5/info.html), miRanda (http://microrna.org), miRDB, PicTar, PITA, and TargetScan were searched for predicted interactions (Paraskevopoulou et al., 2013; Gaidatzis et al., 2007; Betel et al., 2008; Wang, 2008; Krek et al., 2005; Kertesz et al., 2007; Lewis, Burge & Bartel, 2005; Grimson et al., 2007; Friedman et al., 2009; Garcia et al., 2011). It is worth noting that DIANA-microT-CDS, PicTar, PITA and TargetScan are skipped for rat miRNAs. Currently, the mapping module supports human, rat and mouse miRNAs.
The core function of the target mRNA mapping module is rbiomirGS_mrnascan. The input file for this function is a list of miRNA names following the standard miRNA naming convention (http://www.mirbase.org/help/nomenclature.shtml). The function submits SQL queries to the server using the input miRNA list. The returned results are then output as both R list objects and as csv files to the working directory. By setting the species code (hsa, rno or mmu for human, rat or mouse, respectively), the function will search the databases accordingly. The argument queryType governs whether to search for validated or predicted interactions. For the output file, the universal column elements for both validated and predicted queries include Database, Mature miRNA miRBase accession number, Mature miRNA ID (name), Target gene symbol, Target gene Entrez ID, and Target gene Ensembl ID. The output results file will also contain column elements that are unique to the two query types.
miRNA score and mRNA score
The core idea behind the current GS analysis strategy is to quantitatively estimate the miRNA regulatory effect on the target mRNAs, through which the miRNA impact on specific functional gene sets can be evaluated. Based on the initial study by Garcia-Garcia et al. (2016), a miRNA score is first calculated featuring the directionality presented in log FC (or log2FC), and log transformed p value (or −log10(p)). The equation is as follows:
(1) |
As shown in Eq. (1), the Smirna is a linear combination of the sign of log2FC and −log10(p). Integrating p value and the sign of log2FC ensures that both significance and directionality of the change are taken into consideration. Smirna can be calculated either with or without prior filtering of miRNAs. Although either approaches are valid, using the whole miRNA list both reduces the influences from thresholding method and enables a GS analysis resembling the core principle of a competitive GS enrichment approach (De Leeuw et al., 2016), thereby ensuring high compatibility and statistical power.
Upon obtaining Smirna, the mRNA score (Smrna) can be calculated. The current calculation is a modification of the approach proposed by Garcia-Garcia et al. (2016). Such score is a quantitative representation of the potential regulatory effect on the target mRNAs from miRNAs. The equation is as follows:
(2) |
Equation (2) shows that the Smrna of a mRNA is a sign reversed summation of the Smirna of all the upstream miRNAs. The term n is the number of upstream miRNAs for the mRNA of interest; and w is the miRNA:mRNA affinity score, with values set as 1 by default, i.e., no difference between interactions. However, users can set such score using a numeric vector if available.
Logistic regression-based GS enrichment
With Smrna calculated with Eq. (2) and the GS database file, RBiomirGS uses logistic regression to enrich gene sets. Such approach is based on the core concept that a specific gene set is affected if its member genes are also regulated, either at the expression level or by influence from other regulatory factors such as miRNA. Practically, the goal is to assess if a gene can be categorized into a gene set solely based on its Smrna value. As such, the enrichment algorithm models the probability of a gene with a specific Smrna value belonging to a gene set. Mathematically, such probability is represented by the logistic regression sigmoid function (or hypothesis function):
(3) |
As seen in Eq. (3), P is the aforementioned probability, which represents the hypothesis function of logistic regression with parameter vector θ. Transformation of Eq. (3) gives the equation below:
(4) |
Equation (4) shows that the function is the log odds ratio of a gene belonging to the gene set of interest, given the associated Smrna value. Coefficient θ1 stands for the change in the log odds ratio of the gene belonging to the gene set of interest by a unit change in Smrna.
The model parameter is estimated based on the principle of maximum likelihood (Fu & Li, 1993). Specifically, the following log likelihood function is maximized:
where y is the dummified membership to the gene set of interest for a gene, with 1 representing a member, 0 otherwise; m is the number of genes tested. RBiomirGS uses multiple optimization algorithms for finding the optimal parameter value for the model, including iteratively reweighted least square (IWLS), BFGS, and limited memory BFGS-B (L-BFGS-B) (Byrd et al., 1994; Roger, 1987; Wolke & Schwetlick, 1987). Such approach enables users to choose according to the volume of data and available computational power. RBiomirGS utilizes both generalized linear model (glm) function with logit link function natively included in R language, and a manual implementation of the logistic regression sigmoid function and log likelihood function. Specifically, the R native glm with logit link function uses IWLS by default; and the other two optimization methods work by applying general optimization function to the manual logistic regression implementation. To demonstrate the difference in performance with a specific dataset, an analysis of variance (ANOVA) test was conducted on the data from the case study using the statistical analysis R package RBioplot (Zhang & Storey, 2016).
The model significance test is carried out through a Wald test:
where is the estimated model coefficient by maximum likelihood method; and represents the standard error for the estimated model coefficient. The GS p value is then obtained using the z score. For IWLS, t value is used instead to calculate the GS p value with one degree of freedom. All GS p values are then adjusted using a false discovery rate (FDR) (Benjamini & Hochberg, 1995).
The calculation of the scores and logistic regression analysis are achieved through the function rbiomirgs_logistic. The scores, along with the GS database file, are then passed to the logistic modelling process. Similar to the target mRNA mapping function, argument objTitle sets the file name prefix. The miRNA DE object can be set using the mirna_DE argument. The arguments var_mirnaName, var_mirnaFC and var_mirnaP are used to set the column elements for miRNA names, FC and p value, respectively. The target mRNA object can then be set using argument mrnalist. The mrna_Weight argument is used to incorporate the miRNA:mRNA interaction weight matrix, if available. The gs_file argument is used to set the GS database file. The parameter optimization algorithm can be set using argument optim_method. By default, FDR is used to adjust the GS p value via argument p.adj. The GS enrichment results are exported as a csv file. A txt file detailing iterations to convergence if either BFGS or L-BFGS-B is used. The function also outputs the result to the R environment so that data visualization can be carried out.
Data visualization module
The current package includes a data visualization module utilizing the R package ggplot2 (Wickham, 2009). Specifically, the results can be plotted using bar graph and volcano plot. For bar graphs, two types of plots are featured in the package through function rbiomirgs_bar. Specifically, the horizontal bar graph inside the volcano plot depicts the overall distribution of the model coefficient (log odds ratio change per unit Smrna) for all the gene sets tested; whereas the vertical bar graph shows the gene sets with top model coefficient values. The function ranks the absolute coefficient values and plots the top user defined gene sets. The bar graph is model coefficient ± standard error. Users can choose to only plot the significantly enriched gene sets on the bar graphs, as shown in the case study. The volcano plot is carried out by the rbiomirgs_volcano function. Users can set the p value threshold and the number of top gene sets to display on the graph. Additionally, users can freely use other plotting packages to meet their specific data visualization needs.
Results
We demonstrate the usage and performance of RBiomirGS using the liver data from a study assessing the role of miRNAs in facilitating daily torpor in hibernating South American marsupials (Hadj-Moussa et al., 2016). The original study assessed 85 miRNAs in the liver and skeletal muscle of aroused and torpid marsupials using a qPCR approach. Given that the miRNome has yet to be fully characterized for the marsupials, the study used mouse miRNA sequences for primer design. Such approach led to successful amplification of all 85 miRNAs in the marsupial. The case study used the mouse databases for target mRNA mapping. All output files can be downloaded and viewed from supplementary materials. The analysis was carried out on an Apple Macbook Pro computer with Intel Core i5 2.7 GHz dual-core CPU and 8 GB memory.
Figure 2A shows the input file layout. Upon importing the data to the R environment (sample data object name: liver), target mRNA mapping is conducted using the rbiomirgs_mrnascan function, through the command line: rbiomirgs_mrnascan(objTitle = “mmu_liver_predicted”, mir = liver$miRNA, sp = “mmu”, queryType = “predicted”, addhsaEntrez = TRUE, parallelComputing = TRUE, clusterType = “FORK”). Figures 2B and 2C show truncated mapping results for both predicted and validated mapping results for miRNA mmu-miR-25a-5p. The mapping results showed that more predicted targets were retrieved than validated targets. The function output R projects as well as one csv file per miRNA tested. Since the case study enabled human ortholog Entrez ID conversion functionality, the function exported an R object including the Entrez ID for the human orthologs, with the suffix “_hsa_entrez_list” in the name.
Prior to enrichment, GS database files need to be obtained. For the case study, we used gmt files for KEGG and GO term databases downloaded from MSigDB (http://software.broadinstitute.org/gsea/msigdb). Regarding GO term, separated files were used for biological processes (BP) and molecular function (MF) databases. The case study used the predicted miRNA:mRNA interaction results for enrichment. Furthermore, since all GS database files were based on human genes, we used the human ortholog Entrez ID list. GS enrichment was carried out with the command line (using KEGG database as the example): rbiomirgs_logistic(objTitle = “mirna_mrna_iwls”, mirna_DE = liver, var_mirnaName = “miRNA”, var_mirnaFC = “FC”, var_mirnaP = “pvalue”, mrnalist = mmu_liver_predicted_mrna_hsa_entrez_list, mrna_Weight = NULL, gs_file = ”kegg.v5.2.entrez.gmt”, optim_method = “IWLS”, p.adj = “fdr”, parallelComputing = TRUE, clusterType = “PSOCK”).
We tested all three parameter optimization algorithms on the KEGG analysis to select for the most effective method. The KEGG database included 186 pathways. Firstly, the liver data failed to converge for all the gene sets tested using the L-BFGS-B algorithm. Figure 3 shows a truncated version of the IWLS and BFGS results. The results suggest that both methods led to consistent coefficient values and model significance (Figs. 3 and 4). We found that the IWLS method with parallel computing enabled with the Unix operating system exclusive FORK mode took the least amount of time to converge for KEGG analysis (Fig. 5, based on three repeats). The one-way analysis of variance (ANOVA) test on the computation time suggested the time reduction when using such configuration was significant (Fig. 5).
As such, the following GO term enrichment was also carried out using IWLS and FORK methods. The results showed a similar trend as that of the KEGG analysis (Figs. 4 and 6), where more GO terms with a positive model coefficient value were identified.
Discussion
RBiomirGS requires a miRNA identity list, a DE results list, as well as a GS database file as input (Fig. 1). The package uses fold change (FC) and p value to calculate the miRNA score, Smirna. Since the DE results are associated with the miRNAs, both miRNA identity and DE results can be provided in a single csv file. The data layout can be viewed in Fig. 2. In addition, due to the modularization of the package functionality, target mRNA mapping can be used as a standalone function, with a list of miRNA names as input. The GS database file can be downloaded from various sources. One such source is MSigDB, which indexes popular GS database such as KEGG and GO term. Naturally, databases from other sources can also be used.
To efficiently process high throughput datasets, RBiomirGS implements parallel computing across all major functions. Depending on the user’s computer configuration (i.e., number of CPU cores), parallel computing can provide significant speed enhancements. Moreover, both Unix/Unix-like operating system exclusive FORK and universal PSOCK modes are available for maximizing hardware compatibility. It is worth noting that this feature can be disabled by users. Function rbiomirgs_logistic also implements linear algebra for score calculation to reduce computation time.
The target mRNA mapping module also features an optional gene Entrez ID conversion functionality that searches for human gene orthologs on Ensembl databases for rodent models (i.e., mouse or rat). Given the high conservation level in miRNA primary structure across species, such function enables the potential of revealing the miRNA functional implication in human from rodent models. The human Entrez ID conversion function is built upon the open sourced Biomart platform (Durinck et al., 2005; Durinck et al., 2009). By integrating Biomart software into the package, RBiomirGS connects directly to Ensembl database (http://www.ensembl.org) for human ortholog search using the most up-to-date information. While beneficial, such configuration imposes one limitation of the package wherein an active and functional internet connection is required for the target mRNA mapping function.
RBiomirGS conducts GS analysis through mRNA scores, miRNA scores and logistic regression. The mRNA score Smrna is based on the assumption that, in most cases, miRNAs inhibit target mRNA translation events. Therefore, Smrna represents the inhibitory effect on the mRNA of interest. As the sign reversed summation of Smirna, the biological interpretation of Smrna can be described as the following: In the case of a two-group comparison (i.e., experimental vs control), a positive Smrna means the mRNA of interest might be inhibited more in the control group, whereas a negative value means the mRNA might be under miRNA inhibition upon experimental conditions. In addition, a bigger absolute value represents a stronger miRNA inhibitory effect. Given that Smirna contains directionality information, such approach allows for accumulation and cancelation effects on the mRNA when the mRNA of interest is targeted by multiple miRNAs. Since the strength of the interaction between miRNA and mRNA varies among different miRNAs, it is critical to incorporate such consideration into the Smrna calculation, regardless of the availability of such measurement. Therefore, we added the weight term w to Eq. (2) to accommodate the affinity of the miRNA:mRNA interaction, should such metric be available.
The central goal of the current logistic regression-based classification modelling is to separate the members of a gene set from the rest of the genes using Smrna, which represents the overall miRNA regulatory effect. If a gene can be categorized into a gene set solely based on its Smrna, then said gene set is under miRNA-dependent regulation. As such, based on the model significance test and user customizable GS p value threshold (e.g., FDR adjusted p value < 0.05 by default), a GS model with a significant adjusted p value means that the membership to such gene set for a gene can be determined based on its Smrna, or that the gene set is significantly impacted by miRNA regulation. The biological interpretation of the model coefficient from Eq. (4) can be stated as follows (again, in the context of two-group comparison, i.e., experimental vs control): if the coefficient is positive, miRNA inhibition on target mRNAs might be lifted, thereby leading to less suppression on the gene set of interest in the experimental group. Furthermore, with a positive coefficient, a unit increase in Smrna results in an increased odds ratio of a gene belonging to the gene set of interest. Conversely, a negative value means the opposite. It needs to be clarified that a positive model coefficient for a gene set means that the gene set of interest might be under more miRNA-dependent inhibition in the control group, as opposed to being activated under the experimental condition. Such observation is closely related to the fact that the miRNA regulation on a pathway is mostly indirect, and represents only one layer of regulation on the mRNAs. As such, another limitation of RBiomirGS is in its limited capacity for evaluating gene set activation when solely relying on miRNA DE results.
The case study demonstrated the usage of RBiomirGS. In general, enrichment on all three GS databases suggested that more gene sets were free from miRNA-dependent inhibition in the livers of torpid marsupials, represented by positive model coefficient values (Figs. 4 and 6). The result is consistent with the observation from the original study where most miRNAs tested showed decreased relative expression levels in liver (Hadj-Moussa et al., 2016), leading to less inhibitory effect on their target mRNAs, which in turn resulted in more gene sets independent from miRNA-dependent regulation. For example, such enriched KEGG pathways included mTOR signaling pathway and MAPK signaling pathway, which, when activated, were considered to play critical roles in facilitating torpor (Hadj-Moussa et al., 2016). However, the volcano plots in Figs. 4 and 6 suggest that potentially inhibited gene sets in the liver from torpid marsupials exhibited a greater impact by the miRNA, i.e., a wider spread pattern on the x-axis in the negative direction. The KEGG pathways that might be suppressed included Ribosome (KEGG ID: map03010), RNA polymerase (KEGG ID: map03020), Oxidative phosphorylation (KEGG ID: map00190), and Pyruvate metabolism (map00620). Inhibition of those pathways may contribute to suppressing ATP expensive cellular processes such as global gene transcription and protein synthesis, all of which have been reported to be inhibited in other hibernating animals (Storey, 2010; Wu & Storey, 2016). It is also not a surprise that oxidative phosphorylation and pyruvate metabolism pathways were inhibited under hypometabolic conditions (Storey, 1997). Overall, by using RBiomirGS, additional miRNA-dependent regulatory mechanisms that underpin the molecular adaptations facilitating daily torpor in marsupials were revealed.
By incorporating all the core steps into one R package, RBiomirGS eliminates the need for switching between different software packages, or between different software platforms. The package also provides two data visualization functions that can produce three types of plots. Furthermore, the modular structure of RBiomirGS enables various access points to the analysis, with which users can choose the most relevant functionalities for their workflow. With RBiomirGS, users will be able to comprehensively assess the functional implications of the miRNA expression profile under the corresponding experimental condition by minimal input and intervention. Accordingly, RBiomirGS provides an all-in-one and highly accessible miRNA GS analysis solution.
Supplemental Information
Acknowledgments
We thank the Storey lab members for testing the package.
Funding Statement
The present study is supported by a Discovery grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) to Kenneth B Storey (grant number: 6793). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
Kenneth Storey is an Academic Editor for PeerJ.
Author Contributions
Jing Zhang conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Kenneth B. Storey reviewed drafts of the paper.
Data Availability
The following information was supplied regarding data availability:
GitHub: http://github.com/jzhangc/RBiomirGS.
References
- Akhtar et al. (2016).Akhtar MM, Micolucci L, Islam MS, Olivieri F, Procopio AD. Bioinformatic tools for microRNA dissection. Nucleic Acids Research. 2016;44(1):24–44. doi: 10.1093/nar/gkv1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner et al. (2000).Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini & Hochberg (1995).Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1995;57:289–300. [Google Scholar]
- Betel et al. (2008).Betel D, Wilson M, Gabow A, Marks DS, Sander C. The microRNA.org resource: targets and expression. Nucleic Acids Research. 2008;36(Database issue):D149–D153. doi: 10.1093/nar/gkm995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bleazard et al. (2015).Bleazard T, Lamb JA, Griffiths-Jones S, Griffiths-Jones S. Bias in microRNA functional enrichment analysis. Bioinformatics. 2015;31:1592–1598. doi: 10.1093/bioinformatics/btv023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Byrd et al. (1994).Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing. 1994;16:1190–1208. doi: 10.1137/0916069. [DOI] [Google Scholar]
- Chen et al. (2013).Chen M, Zhang X, Liu J, Storey KB. High-throughput sequencing reveals differential expression of miRNAs in intestine from sea cucumber during aestivation. PLOS ONE. 2013;8:e76120. doi: 10.1371/journal.pone.0076120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou et al. (2016).Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, Lee WH, Yang CD, Hong HC, Wei TY, Tu SJ, Tsai TR, Ho SY, Jian TY, Wu HY, Chen PR, Lin NC, Huang HT, Yang TL, Pai CY, Tai CS, Chen WL, Huang CY, Liu CC, Weng SL, Liao KW, Hsu WL, Huang HD. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Research. 2016;44:D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Leeuw et al. (2016).De Leeuw CA, Neale BM, Heskes T, Posthuma D. The statistical properties of gene-set analysis. Nature Reviews Genetics. 2016;17:353–364. doi: 10.1038/nrg.2016.29. [DOI] [PubMed] [Google Scholar]
- Durinck et al. (2005).Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005;21:3439–3440. doi: 10.1093/bioinformatics/bti525. [DOI] [PubMed] [Google Scholar]
- Durinck et al. (2009).Durinck S, Spellman P, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols. 2009;4:1184–1191. doi: 10.1038/nprot.2009.97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman et al. (2009).Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Research. 2009;19:92–105. doi: 10.1101/gr.082701.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu & Li (1993).Fu YX, Li WH. Maximum likelihood estimation of population parameters. Genetics. 1993;134:1261–1270. doi: 10.1093/genetics/134.4.1261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaidatzis et al. (2007).Gaidatzis D, Van Nimwegen E, Hausser J, Zavolan M. Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics. 2007;8:69. doi: 10.1186/1471-2105-8-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia et al. (2011).Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nature Structural & Molecular Biology. 2011;18:1139–1146. doi: 10.1038/nsmb.2115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Garcia et al. (2016).Garcia-Garcia F, Panadero J, Dopazo J, Montaner D. Integrated gene set analysis for microRNA studies. Bioinformatics. 2016;32:2809–2016. doi: 10.1093/bioinformatics/btw334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gomes et al. (2013).Gomes CP, Cho JH, Hood L, Franco OL, Pereira RW, Wang K. A review of computational tools in microRNA discovery. Frontiers in Genetics. 2013;4 doi: 10.3389/fgene.2013.00081. Article 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimson et al. (2007).Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular Cell. 2007;27:91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hadj-Moussa et al. (2016).Hadj-Moussa H, Moggridge JA, Luu BE, Quintero-Galvis JF, Gaitán-Espitia JD, Nespolo RF, Storey KB. The hibernating South American marsupial, Dromiciops gliroides, displays torpor-sensitive microRNA expression patterns. Scientific Reports. 2016;6:24627. doi: 10.1038/srep24627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He & Hannon (2004).He L, Hannon GJ. MicroRNAs: small RNAs with a big role in gene regulation. Nature Reviews Genetics. 2004;5:522–531. doi: 10.1038/nrg1379. [DOI] [PubMed] [Google Scholar]
- Kanehisa & Goto (2000).Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kertesz et al. (2007).Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E. The role of site accessibility in microRNA target recognition. Nature Genetics. 2007;39:1278–1284. doi: 10.1038/ng2135. [DOI] [PubMed] [Google Scholar]
- Khatri, Sirota & Butte (2012).Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches and outstanding challenges. PLOS Computational Biology. 2012;8:e1002375. doi: 10.1371/journal.pcbi.1002375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krek et al. (2005).Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, Da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N. Combinatorial microRNA target predictions. Nature Genetics. 2005;37:495–500. doi: 10.1038/ng1536. [DOI] [PubMed] [Google Scholar]
- Lewis, Burge & Bartel (2005).Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
- Long et al. (2013).Long C, Jiang L, Wei F, Ma C, Zhou H, Yang S, Liu X, Liu Z. Integrated miRNA-mRNA analysis revealing the potential roles of miRNAs in chordomas. PLOS ONE. 2013;8:e66676. doi: 10.1371/journal.pone.0066676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitchell et al. (2008).Mitchell PS, Parkin RK, Kroh EM, Fritz BR, Wyman SK, Pogosova-Agadjanyan EL, Peterson A, Noteboom J, O’Briant KC, Allen A, Lin DW, Urban N, Drescher CW, Knudsen BS, Stirewalt DL, Gentleman R, Vessella RL, Nelson PS, Martin DB, Tewari M. Circulating microRNAs as stable blood-based markers for cancer detection. Proceedings of the National Academy of Sciences of the United States of America. 2008;105:10513–10518. doi: 10.1073/pnas.0804549105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mootha et al. (2003).Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
- Paraskevopoulou et al. (2013).Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, Filippidis C, Dalamagas T, Hatzigeorgiou AG. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Research. 2013;41(Web Server issue):W169–W173. doi: 10.1093/nar/gkt393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team (2017).R Core Team . R Foundation for Statistical Computing; Vienna: 2017. [Google Scholar]
- Roger (1987).Roger F. Practical methods of optimization. Second Edition John Wiley & Sons; New York: 1987. [Google Scholar]
- Ru et al. (2014).Ru Y, Kechris KJ, Tabakoff B, Hoffman P, Radcliffe RA, Bowler R, Mahaffey S, Rossi S, Calin GA, Bemis L, Theodorescu D. The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations. Nucleic Acids Research. 2014;42:e133. doi: 10.1093/nar/gku631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sethupathy, Corda & Hatzigeorgiou (2006).Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197. doi: 10.1261/rna.2239606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Storey (1997).Storey KB. Metabolic regulation in mammalian hibernation: enzyme and protein adaptations. Comparative Biochemistry and Physiology—Part A: Physiology. 1997;118:1115–1124. doi: 10.1016/S0300-9629(97)00238-7. [DOI] [PubMed] [Google Scholar]
- Storey (2010).Storey KB. Out cold: biochemical regulation of mammalian hibernation—a mini-review. Gerontology. 2010;56:220–230. doi: 10.1159/000228829. [DOI] [PubMed] [Google Scholar]
- Subramanian et al. (2005).Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang et al. (2009).Wang K, Zhang S, Marzolf B, Troisch P, Brightman A, Hu Z, Hood LE, Galas DJ. Circulating microRNAs, potential biomarkers for drug-induced liver injury. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:4402–4407. doi: 10.1073/pnas.0813371106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang (2008).Wang X. miRDB: a microRNA target prediction and functional annotation database with a wiki interface. RNA. 2008;14:1012–1017. doi: 10.1261/rna.965408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wickham (2009).Wickham H. ggplot2: elegant graphics for data analysis. Springer; New York: 2009. [Google Scholar]
- Wolke & Schwetlick (1987).Wolke R, Schwetlick H. Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons. SIAM Journal on Scientific and Statistical Computing. 1987;9:907–921. doi: 10.1137/0909062. [DOI] [Google Scholar]
- Wu & Storey (2016).Wu CW, Storey KB. Life in the cold: links between mammalian hibernation and longevity. Biomolecular Concepts. 2016;7:41–52. doi: 10.1515/bmc-2015-0032. [DOI] [PubMed] [Google Scholar]
- Xiao et al. (2009).Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Research. 2009;7(Database issue):D105–D110. doi: 10.1093/nar/gkn851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang & Storey (2013).Zhang J, Storey KB. Akt signaling and freezing survival in the wood frog, Rana sylvatica. Biochimica et Biophysica Acta/General Subjects. 2013;1830:4828–4837. doi: 10.1016/j.bbagen.2013.06.020. [DOI] [PubMed] [Google Scholar]
- Zhang & Storey (2016).Zhang J, Storey KB. RBioplot: an easy-to-use R pipeline for automated statistical analysis and data visualization in molecular biology and biochemistry. PeerJ. 2016;4:e2436. doi: 10.7717/peerj.2436. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
GitHub: http://github.com/jzhangc/RBiomirGS.